Swift performance while manipulating String

Luc-Oivier · September 4, 2020, 5:46pm

I was helping my daughter who learn coding, we started with simple javascipt, so to experience loops we illustrated that with a very simple string array sorting algorithm, the one with to simple loops i and j and a swap when item[i] > item[j].

I wanted to show her it could be different to sort a little array of 10 sort strings and and array of tens of hundreds very long strings.

I got a cat of my home directory, shuffled and parsed it as a javascript array with a little swift routine.

The cat was 25000 lines of paths accounting for 4,095,547 chars.

The whole code in Swift

import Foundation

let fileIn = URL(fileURLWithPath: "/Users/luc-olivier/Development/Learning/Mixed/SortAndFindLab/catalog_shuffled.txt")

let string = try? String(contentsOf: fileIn, encoding: String.Encoding.utf8)
guard string != nil else { assert(false, "File access issue!") }

var dataRnd = string!.components(separatedBy: "\n")
guard dataRnd != [] else { assert(false, "Data not read!") }

print("Data lines: \(dataRnd.count)")

let timeD = Date()

// Basic Sort

var swapCpt = 0
for i in 0 ..< dataRnd.count - 1 {
    for j in i + 1 ..< dataRnd.count {
        if dataRnd[i] > dataRnd[j] {
            let temp = dataRnd[i]
            dataRnd[i] = dataRnd[j]
            dataRnd[j] = temp
        }
    }
}

//

let timeA = Date()
print(timeA.timeIntervalSince(timeD))

let fileOut = URL(fileURLWithPath: "/Users/luc-olivier/Development/Learning/Mixed/SortAndFindLab/catalog_shuffled_SwiftSorted.txt")

try dataRnd.joined(separator: "\n").write(to: fileOut, atomically: true, encoding: String.Encoding.utf8)

The elapsed time was measured with a simple Date() at start and end.

EDITED after optimization (build in mode "Release")

This sheet shows the values (2 tests for each languages).
The row named "sort()" means the test with the collection embed method sort()

With the improvement proposed by @Karl it tooks 26s against 35s.

Additionally, is there any effect from adding the following line before you start timing?

    dataRnd = dataRnd.map { var x = $0; x.makeContiguousUTF8(); return x }

The reason this might help is that components(separatedBy:) is a Foundation method. This should Swiftify any bridged objects that were returned.

It remains enough low compared to Kotlin / Java and even Node/js.

Do you thinks there is a way to optimise compilation? I don't want change the code, in other to remain equal to other tests on other languages.

Nevin · September 4, 2020, 6:08pm

Could you post your sorting code, and some stats on the data (eg. how many strings, and their average length)?

Is your timing being done on just the sort, or also the import of the data file?

Also, how exactly are you measuring the elapsed time, and with what build configuration?

Luc-Oivier · September 4, 2020, 6:38pm

I updated my post with data

Jon_Shier · September 4, 2020, 6:46pm

Can you post your whole program rather than a snippet?

Luc-Oivier · September 4, 2020, 6:51pm

??? What do you mean?

Lantua · September 4, 2020, 6:57pm

Nothing, just realized that what I suggested doesn't really apply here. The snippet you provide doesn't seem to have any problem either. So the problem should lie elsewhere in the program.

Jon_Shier · September 4, 2020, 7:00pm

Sorting 25,000 random 10 length strings takes 6.2s for my (2019 i9 iMac) using your algorithm. It takes 0.0027s using the standard library sort(). My guess is you ran without optimizations or you have something else taking a huge amount of time.

Edit: Using swapAt instead of the manual swapping improves performance from 6.2s to 4.4s.

Code

import CoreFoundation

extension String {
    static func makeRandom(ofLength length: Int) -> String {
        let letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
        return String((0..<length).compactMap { _ in letters.randomElement() })
    }
}

var strings = (0..<25_000).map { _ in String.makeRandom(ofLength: 10) }

let start = CFAbsoluteTimeGetCurrent()

for i in 0 ..< strings.count - 1 {
    for j in 1 ..< strings.count {
        if strings[i] > strings[j] {
            let temp = strings[i]
            strings[i] = strings[j]
            strings[j] = temp
        }
    }
}

//strings.sort()

let end = CFAbsoluteTimeGetCurrent()

print("Time Elapsed: \(end - start)s")

Luc-Oivier · September 4, 2020, 8:13pm

Hi Jon_Shier. Thanks for your answer.

Your string is too short. I used long string as you can find in the path of your home directory.

My file accounts 4095547 chars.

So this said apart, what about time with ObjectiveC. 71 sec is very long. And it's strictly the same algorithm.

Jon_Shier · September 4, 2020, 8:17pm

Increasing the String length to 20 doubles the time to 14s.

Without your original file or your generation methodology it's going to be hard to fully replicate your results. Personally, I'd guess memory allocation and string verification to be the biggest culprits here. I'll see what instruments says.

Edit: Instruments says 30% of the time is spent in the > implementation (UTF correct sorting is expensive), 22% in retain, and 18% in release. So the temp value is really bad here.

Edit2: At smaller String lengths the profile is dominated by swift_beginAccess / swift_endAccess due to runtime exclusivity checking caused by the subscript access.

Karl · September 4, 2020, 9:21pm

Right. The small-string size is 15 bytes on 64-bit platforms. Anything beyond that requires a heap allocation. Comparison shouldn't be a significant overhead. String determines at construction time whether or not it is ASCII (which I assume almost all of these are), and ASCII comparison has a fast-path.

We still need answers to a few questions:

Are you compiling this with optimisations?
Have you tried using MutableCollection's swapAt method, as @Jon_Shier suggested? I would expect the optimiser to do equally well on your code, but again, it's never stated anywhere that you're compiling with optimisations.
Are there any non-ASCII characters in your paths? I'm assuming no, but it would be nice if you could confirm that.
Additionally, is there any effect from adding the following line before you start timing?
```
dataRnd = dataRnd.map { var x = $0; x.makeContiguousUTF8(); return x }
```
The reason this might help is that components(separatedBy:) is a Foundation method. This should Swiftify any bridged objects that were returned.

Luc-Oivier · September 4, 2020, 9:31pm

I try your algorithm. It looks 308 sec on my MacBookPro i9 2019.

What do you mean by "without optimizations"?

I understand (discover) the why, thanks to your questions to you and @Karl.

I was compiling in debug mode so after go to "release mode" it tooks only 35 sec.

Luc-Oivier · September 4, 2020, 10:09pm

Are you compiling this with optimisations?

I was in Debug mode so going to Release mode let run in 35s against 280s

tried using MutableCollection's swapAt

No, but it's not the case for other test on other language so I remain on a "manual" swap.

Are there any non-ASCII characters in your paths? I'm assuming no, but it would be nice if you could confirm that.

Don't think so. Not really in my habit to put Emoji or Symbol in my names. And even it could be 1 time, it's not for main of the 25000 paths.

Additionally, is there any effect from adding the following line before you start timing?

It goes from 35s to 26s.

SDGGiesbrecht · September 4, 2020, 10:59pm

You answered about it being mostly ASCII while I was in the middle of writing. It means this post is probably moot with regards to your specific test, but I’ll leave it here for others who come by wondering something similar.

P.S. You can check for sure how many non‐ASCII scalars are in your 4 095 547‐character string by doing the following:

print(theBigString.unicodeScalars.filter({ $0.value >= 0x80 }).count)

(You said you created the list from your home directory, which could easily include things like the Music library, where an application is itself happily creating files with all sorts of Unicode, named after things you never consciously typed in and never touched in Finder.)

[Original post:]

If you happen to be as French as your name, and have accented characters all throughout your file system...

...then for your particular test, Swift is severely handicapped because of how smart it is. Swift will never be as fast as the other languages in the list, but it will also never be as incorrect.

Swift respects Unicode canonical equivalence, whereas the languages you are comparing against do not. That means Swift considers "a" + "\u{300}" (a + ◌̀) and "\u{E0}" (à) to be equal.

That has implications on its conformance to Comparable as well, since disparate scalar sequences need to compare the same. As such, String sorts lexicographically after normalization. I couldn’t find it in the documentation, so it might just be an implementation detail, but at the moment it happens to use NFC (the composed form). You can verify that with this code:

let composed = "\u{E0}"
let decomposed = "a\u{300}"

let strings = [composed, decomposed, composed, decomposed, "a", "b", "c"]
let sorted = strings.sorted()
let descriptions: [String] = sorted.map { string in
  let scalars = string.unicodeScalars
    .map({ String($0.value, radix: 16, uppercase: true) })
    .joined(separator: ", ")
  return "\(string): \(scalars)"
}
for entry in descriptions {
  print(entry)
}

a: 61
b: 62
c: 63
à: E0
à: 61, 300
à: E0
à: 61, 300

So to do things right, Swift first has to normalize the string before it can lexicographically compare the bytes. The other languages are being negligent and just feeding the raw bytes directly into a lexicographical comparison. You can see the mess they make by sorting the same strings I listed above—the four instances of à will be split into two groups far away from each other in the list.

To make matters worse for your particular test, the macOS filesystem also respects canonical equivalence, but it does so by forcing all file names into NFD (the decomposed form). Swift does fast‐path NFC strings by marking them as already normal so that normalization can be skipped. But by sourcing your list from the file system, you have ensured that none of the strings hit that fast path, because they will all be in the opposite form.

So since Swift is doing a lot more work (for good reason), it will quite expectedly finish your sort more slowly than the other languages. To make a fairer comparison, you could do one of two things:

Make the other platforms do all the work Swift is doing, by swapping out their standard implementation of < with something that compares the string’s normalized forms instead.
Abandon Unicode correctness in Swift, by circumventing the String type and turning each one into a plain array of scalars (Array($0.unicodeScalars)) before you start the sort.

(None of this negates the observations made by the others. It stacks with the sorts of things they have already mentioned.)

[End of original post.]

String(contentsOf:encoding:) comes from Objective‐C. It is really an NSString under the hood, and NSString uses UTF‐16 internally. So what you have loads UTF‐8 from the file system, converting it to UTF‐16 for the sake of NSString, and then bridging it to String. The bridging leaves it in UTF‐16 on the assumption that if it came from Objective‐C it is likely to go back again. Each time you compare such bridged strings, Swift first has to convert them to UTF‐8 (since their order in Swift is defined as bytewise according to UTF‐8). That is a lot of repetitive unnecessary work, all triggered because the strings were fetched from Objective‐C. String instances that begin their existence in native Swift do not suffer from this.

Adding makeContiguousUTF8() at the beginning forced each string into UTF‐8 before starting the sort. Since it now only happens once instead of repeatedly, the sort goes much faster.

Karl · September 4, 2020, 11:57pm

Since you're sorting a large collection, even one non-ASCII string will lead to many comparisons which need to be unicode-aware. See the post above for an excellent summary of why that’s more complex in Swift than some other languages.

I don't think this method in particular is very important (in fact, I think it does have a fast-path for UTF8 which constructs a non-bridged String). The actual strings that get sorted come from the components(separatedBy:) method.

But since that comes from Foundation, who knows what it returns on Darwin platforms? It might be all native Swift objects (unlikely, given the performance difference), a bridged array of Swift strings, or a bridged array of bridged strings.

SDGGiesbrecht · September 5, 2020, 12:16am

Correct, I overlooked that being in the middle. Serves me right for writing the paragraph first and then scrolling up to quickly copy‐and‐paste whichever particular method it was.

CTMacUser · September 5, 2020, 7:00am

Since when I first read this thread (many hours ago), you changed your original post. You added the full code, instead of a quick comment of the code. I wrote my adaptations assuming the old code.

I first wrapped your (old) code in a function. Hopefully, you have (or will soon) explain functions to your daughter. Functions let you reuse routines and isolate the scopes for those routines. My first adaptation:

func simpleSort1(_ dataRnd: inout [Int]) {
    guard !dataRnd.isEmpty else { return }

    for i in 0 ..< dataRnd.count - 1 {
        //print(i)
        for j in i + 1 ..< dataRnd.count {
            if dataRnd[i] > dataRnd[j] {
                let temp = dataRnd[i]
                dataRnd[i] = dataRnd[j]
                dataRnd[j] = temp
                // swapCpt += 1
            }
        }
    }
}

The "inout" part of the function signature lets me make alterations to the function's input and have it stick afterwards. Note that I put a "guard" statement before your code. That's because your code won't work when the array starts out empty.

x..<y

When x is greater than y, the above code will crash at runtime. You could ward against this with an explicit "x > y" check somewhere, but a lot of programming Swift (and in general) revolves about setting up preconditions and taking advantage of those preconditions. Here, we'll exploit the fact that when x equals y, we get an empty loop instead of a crash.

func simpleSort2(_ dataRnd: inout [Int]) {
    for i in 0..<dataRnd.count {
        //print(i)
        for j in i+1 ..< dataRnd.count {
            if dataRnd[i] > dataRnd[j] {
                let temp = dataRnd[i]
                dataRnd[i] = dataRnd[j]
                dataRnd[j] = temp
                // swapCpt += 1
            }
        }
    }
}

The next optimization is that Array provides a custom routine to swap two elements of the same array. This method was made to get around problems that a collection can be mentioned in a mutation sense only once per statement.

func simpleSort3(_ dataRnd: inout [Int]) {
    for i in 0..<dataRnd.count {
        //print(i)
        for j in i+1 ..< dataRnd.count {
            if dataRnd[i] > dataRnd[j] {
                dataRnd.swapAt(i, j)
                // swapCpt += 1
            }
        }
    }
}

The next step is to abstract out the element type. The routine can be generalized for any type that excepts the less-than, etc. family of operations. (This is especially relevant since my code assumed you used Int, but you are actually working with String.)

func simpleSort4<T: Comparable>(_ dataRnd: inout [T]) {
    for i in 0..<dataRnd.count {
        //print(i)
        for j in i+1 ..< dataRnd.count {
            if dataRnd[i] > dataRnd[j] {
                dataRnd.swapAt(i, j)
                // swapCpt += 1
            }
        }
    }
}

The indexes of a collection are not in general integers from 0 to count - 1. We can naively abstract this with a list of the indices to loop over.

func simpleSort5<T: Comparable>(_ dataRnd: inout [T]) {
    let indices1 = Array(dataRnd.indices)
    for i in indices1 {
        //print(i)
        let indices2 = Array(dataRnd[i...].indices.dropFirst())
        for j in indices2 {
            if dataRnd[i] > dataRnd[j] {
                dataRnd.swapAt(i, j)
                // swapCpt += 1
            }
        }
    }
}

Note that I could have done this:

func simpleSort5a<T: Comparable>(_ dataRnd: inout [T]) {
    for i in dataRnd.indices {
        //print(i)
        for j in dataRnd[i...].dropFirst().indices {
            if dataRnd[i] > dataRnd[j] {
                dataRnd.swapAt(i, j)
                // swapCpt += 1
            }
        }
    }
}

But the naive list of indexes, the indices member, may secretly reference its owner, leading to recopying when the array gets mutated while indices is also in use. The best way to get around that (for now) is to use manual iteration, requiring a change to a while loop.

func simpleSort6<T: Comparable>(_ dataRnd: inout [T]) {
    let end = dataRnd.endIndex
    var i = dataRnd.startIndex
    while i < end {
        var j = dataRnd.index(after: i)
        while j < end {
            if dataRnd[i] > dataRnd[j] {
                dataRnd.swapAt(i, j)
            }
            dataRnd.formIndex(after: &j)
        }
        dataRnd.formIndex(after: &i)
    }
}

Hopefully this won't confuse your daughter. The change from step 5 to 6 is pretty big. And I still didn't go over abstracting ">" to a general closure and/or abstracting the Array type to a general MutableCollection.

Lantua · September 5, 2020, 11:51am

If we're still using [T], we could stop at 5a since indices doesn't store the collection. The code 5 could even be detrimental since you're creating an array from range in the inner loop.

Ben_Cohen · September 5, 2020, 3:57pm

swapAt is idiomatic Swift. It's generally inappropriate to ignore a language's idioms when comparing performance. For (a more extreme) example, if you wrote some imperative code in OCaml, and it underperformed a more idiomatic functional way of writing it, then it would be reasonable to discount that benchmark.

Luc-Oivier · September 5, 2020, 5:52pm

Thanks @CTMacUser. I transmit to her your excellent and gradual course.

Lantua · September 5, 2020, 6:45pm

Let me add on this a bit. AFAIK, MutableCollection doesn't invalidate indices when mutating. So 6 is most definitely overkill.