Performance issues on Linux compared to Rust

herrhippopotamus · December 15, 2024, 1:21pm

I'm new to Swift and wanted to start off by implementing Advent of Code 2024 Day 1. I was a little bit concerned by the resulting runtime performance of my Swift implementation. As a reference point, I also implemented the first day in Rust.
My Swift implementation can be found here: swifty_advent_2024

Running the Rust implementation on my laptop 1,000 times takes on average 100 µs. Swift on the other hand takes between 5 and 6 ms → more than 50 times slower.
UPDATE:
Thank's to the suggestions from @taylorswift and @bjhomer, the runtime improved from 5 to 1.7 ms on average.

I know Rust is notoriously famous for being "blazingly" fast, but I don't believe Rust is that much faster. Does my Swift implementation contain some severe performance bottlenecks?

I compile the code with:

$ swift build --configuration release -Xswiftc -Ounchecked

OS: Linux Mint 20.3.

The Swift code currently looks like this (just the day1-related code):

import Foundation

// parsing the puzzle with regex was 20 % faster than the first approach of parsing the numbers without regex. However, the variant utilizing "lazy" is now significantly faster than the regex approach.
// This function is not used anymore - this is dead code!
func parseNumbersRegex(_ input: String) -> ([Int],[Int]) {
    let xsRegex = try! Regex(#"\d+"#)

    var xs: [Int] = []
    var ys: [Int] = []

    xs.reserveCapacity(1000)
    ys.reserveCapacity(1000)

    for (i, d) in  input.matches(of: xsRegex)
        .compactMap({ match in
            Int(input[match.range]) 
        }).enumerated() {
        if i % 2 == 0 {
            xs.append(d)
        } else {
            ys.append(d)
        }
    }
    return (xs, ys)
}

// parsing the puzzle: after lazy-optimizations now faster than the regex variant
func parseNumbers(_ input: String) -> ([Int],[Int]) {
    var xs = [Int]();
    var ys = [Int]();

    
    // let lines = input.components(separatedBy: "\n");
    // optimized integer parsing, thanks to taylorswift and bjhomer:
    let lines = input.lazy.split(separator: "\n")
    for line in lines {
        // let nums = line.components(separatedBy: .whitespaces).compactMap{ Int($0) }
        // optimized integer parsing, thanks to taylorswift and bjhomer:
        let nums = line.split(separator: " ").compactMap{ Int($0)}
        xs.append(nums[0])
        ys.append(nums[1])
    }
    return (xs, ys)
}
func countOccurences(_ xs: [Int]) -> [Int:Int] {
    return xs.reduce(into: [:]) { acc, x in
        acc[x, default: 0] += 1
    }
}

func day1() -> (Int, Int){
    var (xs, ys) = parseNumbersRegex(puzzle_1)
    xs.sort()
    ys.sort()

    let dist0 = zip(xs, ys).reduce(0) { acc, pair in
        let (x, y) = pair
        return acc + abs(x - y)
    }
    let yOcc = countOccurences(ys)
    let dist1 = xs.reduce(0) {acc, x in 
        return acc + x * yOcc[x, default: 0]
    }
    return (dist0, dist1)
}

let puzzle_1 = """
38665   13337
84587   21418
93374   50722
68298   57474
54771   18244
...
"""

taylorswift · December 15, 2024, 3:08pm

herrhippopotamus:

    let lines = input.components(separatedBy: "\n");
    for line in lines {
        let nums = line.components(separatedBy: .whitespaces).compactMap{ Int($0) }
        xs.append(nums[0])
        ys.append(nums[1])
    }

this makes two intermediate Array allocations, which could be part of the reason for the slowdown, if Rust is able to do this parsing without allocating intermediates. what you probably want is a “lazified” components(separatedBy:), but i’m not aware of anything like that in the standard library.

bjhomer · December 15, 2024, 3:45pm

If you import the swift-algorithms package, there's an implementation of split(separator:) on lazy sequences, so it could be:

import Algorithms
// ...
let nums = line.lazy.split(separator: " ").compactMap { Int($0) }

herrhippopotamus · December 15, 2024, 5:35pm

Thank's for pointing that out. I didn't expect allocating this small array to have such a big impact but as it turned out, it added a significant overhead.

herrhippopotamus · December 15, 2024, 5:41pm

Thank's for the hint! Adding lazy to this line reduced the runtime from 5 to 3 ms. Adding lazy to the line splitting reduced it even further from 3 to 1.7 ms (the code snippet is updated, incorporating lazy evaluation)

oscbyspro · December 15, 2024, 5:49pm

If it is slow, then it is at least not a terminal illness.

I haven't done the advent stuff, but I wanted to try out my own numeric library:

Machine: MacBook Pro, 13-inch, M1, 2020

Elapsed on average: 5.787 ms
Elapsed on average: 0.130 ms // (45x) parsing 
Elapsed on average: 0.115 ms // (50x) parsing + dictionary

Most of the time is spent parsing the input so I rewrote that part. Click for details:

Ultimathnum: 29b944f87ec8a02b84a012b167191469655024be (main)...

dependencies: [
//  .package(url: "https://github.com/oscbyspro/Ultimathnum.git", branch: "main"), // required run in terminal..?
    .package(url: "https://github.com/oscbyspro/Ultimathnum.git", exact: "29b944f87ec8a02b84a012b167191469655024be"),
],

Unsafe rewrite of parsing function because UTF-8 spans are not a thing yet...

import Ultimathnum

// The input is known so I did not write any error handling.
func parseNumbersGoesBrrr(_ input: String) -> ([Int], [Int]) {
    input.utf8.withContiguousStorageIfAvailable { data in
        var xs = [Int]()
        var ys = [Int]()
        
        xs.reserveCapacity(1000)
        ys.reserveCapacity(1000)
            
        let coder = TextInt.decimal
        var index = data.startIndex
        
        decoding: while index < data.endIndex {
            trim: while index < data.endIndex {
                switch data[index] {
                case 32: fallthrough
                case 10: index &+= 1
                default: break trim
                }
            }
            
            let start = index
            find: while index < data.endIndex {
                switch data[index] {
                case 32: fallthrough
                case 10: break find
                default: index &+= 1
                }
            }
            
            let content = UnsafeBufferPointer(rebasing: data[start..<index])
            let integer = Int(coder.decode(content, as: IX.self)! .unwrap())
            
            switch xs.count == ys.count {
            case true:  xs.append(integer)
            case false: ys.append(integer)
            }
        }
        
        return (xs, ys)
    }!
}

Trivial rewrite of dictionary stuff because it makes a noticeable difference....

// This rewrite just reserves some capacity first.
private func countOccurencesGoesBrrr(_ xs: [Int]) -> [Int: Int] {
    var acc: [Int: Int] = [:]
    acc.reserveCapacity(xs.count)
    
    for x in xs {
        acc[x, default: 0] += 1
    }
    
    return acc
}

It's not a general solution, but you get the idea. I'm sure you can write something even faster in Swift proper given that my integer parsing algorithm is arbitrary precision and variable radix. Perhaps somebody else has more ideas; this was the time I had to spare on it. Note that I have not looked at the actual problem; only the code and the results.

sspringer · December 15, 2024, 8:45pm

Some more thoughts:

Regex operations are still really slow in Swift, but help is coming
Using literal regex expressions like /\d+/ should be faster than parsing them during runtime.
From what I could find, Rust is only aware of grapheme clusters (= characters) when writing "my text".graphemes(true), whereas Swift by default respects grapheme clusters, which is more expensive (e.g. when looking for a character it has to make sure no combining characters follow).
components(separatedBy:) gives you an array of Strings, split(separator:) gives you Substrings, generally it should be cheaper to work with Substrings as long as possible. So you would actually like to have a lazy split…

Josh_Wisenbaker · December 18, 2024, 4:53pm

The Apple AoC template includes the swift-algorithms and swift-collections packages for stuff like this. On day two I ended up using the adjacentPairs() and it made some things much simpler.

Josh_Wisenbaker · December 18, 2024, 5:39pm

I've been messing generally with line reading and splitting performance lately, and have found enumerateLines(invoking:) to be a fast and simple way to iterate through a multi-line string without creating more Arrays.

func parseNumbers(_ input: String) -> ([Int], [Int]) {
    var xs = [Int]()
    var ys = [Int]()
    // let lines = input.components(separatedBy: "\n");
    // optimized integer parsing, thanks to taylorswift and bjhomer:
    input.enumerateLines { line, stop in
        let nums = line.components(separatedBy: .whitespaces).compactMap { Int($0) }
        xs.append(nums[0])
        ys.append(nums[1])
    }

    return (xs, ys)
}

Just swapping out the split(separator:) for enumerateLines(invoking:) will save you a bunch of instructions when using -Ounchecked and even more when using a more sane -O. (Not that we are looking for safe advent computing here.)

One thing to note is that it is going to give you the strings without line breaks on the ends as it is using them already to determine the lines to feed you.

herrhippopotamus · December 18, 2024, 7:24pm

The Hypthesis doesn't match my obersvations:

input.enumerateLines & line.components: 7.8 ms

func parseNumbers(_ input: String) -> ([Int],[Int]) {
    var xs = [Int]();
    var ys = [Int]();

    xs.reserveCapacity(1000)
    ys.reserveCapacity(1000)

    input.enumerateLines{line, stop in 
        let nums = line.components(separatedBy: .whitespaces).compactMap { Int($0) }
        xs.append(nums[0])
        ys.append(nums[1])
    }

    return (xs, ys)
}

input.enumerateLines & line.split: 3.9 ms

func parseNumbers(_ input: String) -> ([Int],[Int]) {
    var xs = [Int]();
    var ys = [Int]();

    xs.reserveCapacity(1000)
    ys.reserveCapacity(1000)

    input.enumerateLines{line, stop in 
        let nums = line.split(separator: " ").compactMap{Int($0)}
        xs.append(nums[0])
        ys.append(nums[1])
    }

    return (xs, ys)
}

input.split & line.split: 1.6 ms

func parseNumbers(_ input: String) -> ([Int],[Int]) {
    var xs = [Int]();
    var ys = [Int]();

    xs.reserveCapacity(1000)
    ys.reserveCapacity(1000)

    let lines = input.split(separator: "\n")
    for line in lines {
        let nums = line.split(separator: " ").compactMap{ Int($0) }
        xs.append(nums[0])
        ys.append(nums[1])
    }
    return (xs, ys)
}

What really surprises me is that taking advantage of the fact that there are always 3 spaces between the numbers significantly degrades performance. Specifically, adjusting the third approach to use line.split(separator: " ").map{ Int($0)! } results in a runtime of 12.1 ms — more than 10 times slower.

It’s puzzling that such a seemingly minor adjustment could cause such a significant drop in performance.

While oscbyspro’s solution is interesting, it doesn’t quite address my goals here. If I were pursuing maximum performance, I wouldn’t use Swift in the first place — I’d use a language designed for speed, like Rust. However, this post isn’t about squeezing every last bit of performance from the snippet. Instead, I’m focused on gaining a deeper understanding of Swift’s internal mechanics and how they influence runtime performance.

Thanks to everyone’s input, the code is now running 5 times faster than my initial approach, which aligns well with what I’d expect when comparing Swift to Rust. I really appreciate all your help in achieving these improvements!

taylorswift · December 18, 2024, 7:53pm

at the risk of being that person, Swift is designed for speed, you just have not taken any of the suggestions in this thread as you are still allocating twice per inner loop.

you can surely argue that the most readily accessible tools (split, Array.compactMap) are not that efficient, but those are just standard library functions and you can still write far more performant variants in Swift. it’s not the language holding you back, it’s the eager allocation pattern you’re using.

Josh_Wisenbaker · December 19, 2024, 2:49pm

Ahh, I didn't mean to use components(separatedBy:_) there. I blame code completion.

Most of the time I err on the side of readability, but there are still some sharp edges in Swift that can get in the way. For example, enumerateLines and enumerateSubstrings are escaping and thus you can't use them to modify self from a struct.

I actually have just gone through the real-world process of optimizing some string processing in swift-format. There the things that helped were:

Stop unnecessary allocations
Only scan the string once
For the last scraps of speed, use the .utf8 view instead of Character based APIs.

All of this has got me thinking that I will go back and try to optimize my Day 1 solution as much as possible. :D

Josh_Wisenbaker · December 19, 2024, 10:38pm

Replying to myself here... I took a look on both macOS and Windows with my Day 1 solutions, plugged into @herrhippopotamus benchmark harness. I tested the different permutations as I went and put the results in the commit messages. Even on my crusty, old, CPU boost disabled, Windows laptop the average results were always under 1 ms when using release builds.

So I set the code to run 10 million loops and did some code profiling. The results are interesting:

With the dataset included in code the init of the array and parsing of the strings gets inlined by the compiler. It essentially doesn't even take any time. On a run of 30 seconds the Day1.init() is only 2 ms of that time. Instruments says that it's got a weight of 0.0% on the stack trace.
The part one of the solution got inlined as well and it's only got a weight of 2.0% and 589 ms out of 30 seconds.
It's part 2 of my solution that takes all the time, specifically reduce(_:_:) and reduce(into:_:). I tried swapping reduce(into:_:) out with just a loop and it didn't make much of a difference.

I may mess with all this later, but I was wondering if maybe this was a Linux specific issue you are running into? What sort of machine are you running it on?

herrhippopotamus · December 20, 2024, 9:51am

Your code defines a Day1 class that parses the puzzle during initialization, prior to entering the benchmark loop. As a result, the most time-consuming portion of the code is excluded from the benchmark runs, which defeats the purpose of the "benchmark harness."

I created a pull request with a "fix". Running that code takes the 1.7 ms that I observed on my machine:

herrhippopotamus · December 20, 2024, 10:15am

By adopting @oscbyspro's solution without relying on external packages, I managed to improved 1.7 ms to 0.2 ms by avoiding array allocations:

func parseNumbersWithoutAllocations(_ input: String) -> ([Int],[Int]) {
    var xs = [Int]()
    var ys = [Int]()

    xs.reserveCapacity(1000)
    ys.reserveCapacity(1000)

    func swapArrays(_ xs: inout [Int], _ ys: inout [Int]) {
        let bu = xs
        xs = ys
        ys = bu
    }

    let whitespace: [UInt8] = [10, 32]
    
    input.utf8.withContiguousStorageIfAvailable { data in
        var start = 0
        for i in 0..<data.endIndex {
            let char = data[i]
            if whitespace.contains(char) {
                if start < i {
                    let num = Int(
                        String(decoding: UnsafeBufferPointer(rebasing: data[start..<i]), as: UTF8.self)
                    )!
                    xs.append(num)
                    swapArrays(&xs, &ys)
                }
                start = i + 1
            }
        }
    }
    return (xs, ys)
}

While this approach yields significant performance improvements, it has a notable drawback: manually handling string operations at the byte level. This makes parsing non-ASCII characters unreliable and prone to errors. Am I overlooking any built-in string functions that could work seamlessly with withContiguousStorageIfAvailable?

Using the swift-algorithms package for lazily splitting strings didn’t offer any performance gains compared to non-lazy solutions from the standard library ("".split vs. "".lazy.split).

Josh_Wisenbaker · December 20, 2024, 1:59pm

Thanks for seeing the obvious thing I was missing there!

I see from your later post that you also got some better performance out of it.

Josh_Wisenbaker · December 20, 2024, 7:41pm

The discussion of String performance comes up around Advent of Code time every year it seems. An older, but still good, view of how Swift Strings work is Piercing the String Veil.

By default, the Swift String APIs are made to be flexible and safe to use, but they aren't seeking out total performance. As you've found you can drop down to the bytes if you want to, but it's a lot uglier down there.

That said, there are some things that make dealing with the bytes a bit easier such as the UInt8(ascii:) or UTF8.CodeUnit(ascii:) so you can get a utf8 element without needing to look up all the byte codes. You can also use the cString tools on String to make things a bit less cumbersome to deal with. It's still not a ton of fun though.

If I were to make your code into a bit more idiomatic Swift I would use the enumerated() method so that you automatically get an index counter and to simplify it a bit for the reader.

let whitespace = [UInt8(ascii: "\n"), UTF8.CodeUnit(ascii: " ")]

        input.utf8.withContiguousStorageIfAvailable { data in
            var start = 0
            for (idx, char) in data.enumerated() {
                if whitespace.contains(char) {
                    if start < idx {
                        let num = Int(
                            String(decoding: data[start..<idx], as: UTF8.self)
                        )!
                        xs.append(num)
                        swapArrays(&xs, &ys)
                    }
                    start = idx + 1
                }
            }
        }
        return (xs, ys)
    }

oscbyspro · December 21, 2024, 9:42am

So—I only shared my approach because I had incidentally performance-profiled the original code by writing something much faster. I spent about 5 minutes on it, but I did not want you guys to waste time on things that do not matter. I took the unsafe route because my library operates on that level. Tangentially, the first thing Int.init?(_:radix:) does with its input String is asking for an UnsafeBufferPointer<UInt8> so these buffer-to-string conversions super silly. But if you want to compare idiomatic and out-of-the-box Swift (± Algorithms), the following is straightforward and runs in about 0.19 ms (vs 0.12 ms) on my machine.

input.utf8.withContiguousStorageIfAvailable { data in
    for line in data.lazy.split(separator: UInt8(ascii: "\n")) {
        var integers = line.lazy.split(separator: UInt8(ascii: " ")).makeIterator()
        xs.append(Int(String(decoding: integers.next()!, as: UTF8.self))!)
        ys.append(Int(String(decoding: integers.next()!, as: UTF8.self))!)
    }
}!