Weekly Swift articles, podcasts and tips by John Sundell.

Strings

Published on 31 Jul 2019

How a programming language models text is often much more complicated than what it first may seem like — and Swift’s String type is no different. While strings are one of the most common pieces of data that any program will deal with, and something that we as people are highly familiar with, there are some very real challenges when it comes to how text is represented within code. Understanding some of those challenges, and how Swift aims to overcome them, can often be key in order to make working with strings much easier in different contexts.

Swift strings are modeled as collections of characters that are encoded using the very common UTF-8 text encoding, which makes it possible for them to represent all sorts of different characters and emoji. Since strings are collections, they can both be initialized using a literal (a string defined inline in code), and using another collection containing Character elements — such as an array:

let stringA = "Hello!"
let stringB = String(["H", "e", "l", "l", "o", "!"])
print(stringA == stringB) // true

Strings also behave like other collections (such as arrays or dictionaries) in many other aspects as well. For example, we can easily iterate over all of the characters within a string, just like how we’d iterate over the elements within an array:

func printCharacters(in string: String) {
    for character in string {
        print(character)
    }
}

We can even use most common collection operations, such as map, flatMap and compactMap directly on strings as well. For example, here’s how we’re using map on a string to turn it into an array of characters:

func characters(in string: String) -> [Character] {
    return string.map { $0 }
}

Because strings and arrays are so similar in terms of what kind of APIs that they support, we might also expect to be able to retrieve any character from a string using an Int-based subscript — like this:

let string = "Hello, world!"
let character = string[1]

However, running the above code gives us a compiler error, since (unlike arrays) we can’t get random access to any character within a string by using an Int index. To understand why that is, we have to dive one level deeper — beyond the surface-level API of String, and take a look at how strings are actually represented under the hood.

Let’s start by taking a string that includes a special (or non-ASCII) character — café in this case — and compare the difference between its number of characters and the number of UTF-8 code units that is used to represent it:

"Café".count // 4
"Café".utf8.count // 5

As we can see above, the number of perceived characters and the actual number of UTF-8 characters within a string aren’t always equal. That difference gets even larger when we start adding emoji into the mix — especially ones that are composed of many different code units, such as the different variants of the family emoji:

"👨‍👩‍👧‍👦".count // 1
"👨‍👩‍👧‍👦".utf8.count // 25

Because of the above difference in how characters actually work under the hood, Swift doesn’t offer a way to access a string’s characters using raw Int indexes — and instead opts for a more locked-down and safe approach, using a dedicated String.Index type, and specific APIs for manipulating such indexes. Each string comes with a startIndex and an endIndex, and using those we can derive any other index that we wish to retrieve a character for — like this:

let string = "Hello, world!"

let secondIndex = string.index(after: string.startIndex)
let thirdIndex = string.index(string.startIndex, offsetBy: 2)
let lastIndex = string.index(before: string.endIndex)

print(string[secondIndex]) // e
print(string[thirdIndex]) // l
print(string[lastIndex]) // !

We can also form ranges out of string indexes, and use those ranges to extract a part of a string — what’s commonly referred to as a substring:

let range = secondIndex..<lastIndex
let substring = string[range]
print(substring) // ello, world

What’s interesting about substrings in Swift is that they’re actually not String values. Instead, they’re represented using the Substring type, which enables us to retrieve and pass substrings around without constantly copying the underlying string — which is great for performance in situations when we’re dealing with large bodies of text. However, the fact that strings and substrings aren’t instances of the same type can lead to some tricky situations, such as if we try to assign a substring as the text of a UILabel:

let label = UILabel()
label.text = substring // Compiler error

The above code gives us a compiler error saying that a Substring can’t be assigned to a String? property since — as far as the compiler is concerned — they’re completely separate types. Thankfully, converting a substring into a proper string is as easy as doing this:

label.text = String(substring)

One thing to keep in mind when performing the above kind of conversion is that it does copy the substring into a new string, which is most often what we want, since it also enables the underlying string to get released from memory (if no other substrings are still referring to it).

While String and Substring are different types, they do have a lot of APIs in common — since they both conform to the same StringProtocol. That comes very much in handy in situations when we want to write generic code that can be used with either strings or substrings — such as this function that extracts all the letters out of a string:

func letters<S: StringProtocol>(in string: S) -> [Character] {
    return string.filter { $0.isLetter }
}

Besides isLetter, Swift’s Character type comes with a whole suite of other properties that makes it easy to check what kind of character that we’re dealing with.

Compared to other programming languages, Swift’s way of modeling strings may at first seem quite complicated, but there are good reasons as to why Swift’s String API is designed the way it is — especially when looking at how strings are actually stored, and some of the challenges of accurately representing text within a modern, international app. Overall, Swift strings give us great performance and safety, even though it might cost us a bit of convenience here and there.

Thanks for reading! 🚀