Strings
How a programming language models text is often much more complicated than what it first may seem like — and Swift’s String
type is no different. While strings are one of the most common pieces of data that any program will deal with, and something that we as people are highly familiar with, there are some very real challenges when it comes to how text is represented within code. Understanding some of those challenges, and how Swift aims to overcome them, can often be key in order to make working with strings much easier in different contexts.
Swift strings are modeled as collections of characters that are encoded using the very common UTF-8 text encoding, which makes it possible for them to represent all sorts of different characters and emoji. Since strings are collections, they can both be initialized using a literal (a string defined inline in code), and using another collection containing Character
elements — such as an array:
let stringA = "Hello!"
let stringB = String(["H", "e", "l", "l", "o", "!"])
print(stringA == stringB) // true
Strings also behave like other collections (such as arrays or dictionaries) in many other aspects as well. For example, we can easily iterate over all of the characters within a string, just like how we’d iterate over the elements within an array:
func printCharacters(in string: String) {
for character in string {
print(character)
}
}
We can even use most common collection operations, such as map
, flatMap
and compactMap
directly on strings as well. For example, here’s how we’re using map
on a string to turn it into an array of characters:
func characters(in string: String) -> [Character] {
return string.map { $0 }
}
Because strings and arrays are so similar in terms of what kind of APIs that they support, we might also expect to be able to retrieve any character from a string using an Int
-based subscript — like this:
let string = "Hello, world!"
let character = string[1]
However, running the above code gives us a compiler error, since (unlike arrays) we can’t get random access to any character within a string by using an Int
index. To understand why that is, we have to dive one level deeper — beyond the surface-level API of String
, and take a look at how strings are actually represented under the hood.
Let’s start by taking a string that includes a special (or non-ASCII) character — café
in this case — and compare the difference between its number of characters and the number of UTF-8 code units that is used to represent it:
"Café".count // 4
"Café".utf8.count // 5
As we can see above, the number of perceived characters and the actual number of UTF-8 characters within a string aren’t always equal. That difference gets even larger when we start adding emoji into the mix — especially ones that are composed of many different code units, such as the different variants of the family emoji:
"👨👩👧👦".count // 1
"👨👩👧👦".utf8.count // 25
Because of the above difference in how characters actually work under the hood, Swift doesn’t offer a way to access a string’s characters using raw Int
indexes — and instead opts for a more locked-down and safe approach, using a dedicated String.Index
type, and specific APIs for manipulating such indexes. Each string comes with a startIndex
and an endIndex
, and using those we can derive any other index that we wish to retrieve a character for — like this:
let string = "Hello, world!"
let secondIndex = string.index(after: string.startIndex)
let thirdIndex = string.index(string.startIndex, offsetBy: 2)
let lastIndex = string.index(before: string.endIndex)
print(string[secondIndex]) // e
print(string[thirdIndex]) // l
print(string[lastIndex]) // !
We can also form ranges out of string indexes, and use those ranges to extract a part of a string — what’s commonly referred to as a substring:
let range = secondIndex..<lastIndex
let substring = string[range]
print(substring) // ello, world
What’s interesting about substrings in Swift is that they’re actually not String
values. Instead, they’re represented using the Substring
type, which enables us to retrieve and pass substrings around without constantly copying the underlying string — which is great for performance in situations when we’re dealing with large bodies of text. However, the fact that strings and substrings aren’t instances of the same type can lead to some tricky situations, such as if we try to assign a substring as the text
of a UILabel
:
let label = UILabel()
label.text = substring // Compiler error
The above code gives us a compiler error saying that a Substring
can’t be assigned to a String?
property since — as far as the compiler is concerned — they’re completely separate types. Thankfully, converting a substring into a proper string is as easy as doing this:
label.text = String(substring)
One thing to keep in mind when performing the above kind of conversion is that it does copy the substring into a new string, which is most often what we want, since it also enables the underlying string to get released from memory (if no other substrings are still referring to it).
While String
and Substring
are different types, they do have a lot of APIs in common — since they both conform to the same StringProtocol
. That comes very much in handy in situations when we want to write generic code that can be used with either strings or substrings — such as this function that extracts all the letters out of a string:
func letters<S: StringProtocol>(in string: S) -> [Character] {
return string.filter { $0.isLetter }
}
Besides isLetter
, Swift’s Character
type comes with a whole suite of other properties that makes it easy to check what kind of character that we’re dealing with.
Compared to other programming languages, Swift’s way of modeling strings may at first seem quite complicated, but there are good reasons as to why Swift’s String
API is designed the way it is — especially when looking at how strings are actually stored, and some of the challenges of accurately representing text within a modern, international app. Overall, Swift strings give us great performance and safety, even though it might cost us a bit of convenience here and there.
Thanks for reading! 🚀