A roadmap for improving Swift performance predictability: ARC improvements and ownership control

Karl · December 22, 2021, 3:46pm

A couple of things I'd like to add, on further reflection:

One of the goals I had when making WebURL (which I don't think I've mentioned before) was that it should be 100% Swift. No C shims at all. I also want it to be the fastest implementation that exists anywhere, including being faster than Chromium/WebKit (which are written in C++). Lofty goals, for sure, but I wanted to see if it was even possible in Swift, and where the language might need to improve to make it possible.

And to that end, I have encountered a couple of areas where it does feel like I'm fighting the language/compiler, and I think they are also worth considering as part of this effort. Personally, ARC is very low on the list of things I struggle with, compared to any of these issues:

1. Implicit initialization of optionals.

This has been discussed a couple of times on the forums, (e.g. What is a ‘variable initialization expression’?). Basically, for stored properties, writing var x: Foo? versus var x: Optional<Foo> can have a huge performance impact.

Since these efforts are aimed at improving performance predictability, I think this should also be in scope.

2. Excessive calls to "outlined init with take".

Sometimes, loading a stored property which is an optional can be... just incredibly expensive. See this forum post for more information and Godbolt examples: Expensive calls to “outlined init with take”. I included some benchmark results at the time which showed a 17% (!) performance improvement on an already-optimised library by writing my own getters using MemoryLayout APIs.

I think this is more of a bug than a problem with the language, but again - performance predictability. It would be nice if I could use getters again.

3. Bounds-checking.

One of the other goals is that the library should be memory-safe. We've been a fuzzy about this in the past, just claiming that things should be 'safe by default', but I consider it a key feature for any library written in Swift. One of the reasons memory safety is such a big deal these days is because programmers are not very good at manually proving that memory accesses are safe - really, every access should be checked, and we should be relying on static verification by compiler rather than manual inspection to eliminate them.

There are a number of techniques that can help with that. One is to use buffers with unsigned integers as indexes (implementation and detailed notes here). I would love to see something like that in the standard library.

Another technique is to change how you write collection algorithms. One of the things I see quite a lot in Swift code (including the standard library) is this:

var i = startIndex
while i != endIndex {
  useElement(self[i])
  formIndex(after: &i)
}

This is really bad for eliminating bounds-checking. To the compiler, startIndex and endIndex are just some integers (or opaque values, almost always wrapping integers). It has no idea how they relate. But the checks in your subscript do have very specific expectations, such as:

subscript(position: Int) -> Element {
  precondition(position >= startIndex && position < endIndex, "Index out of bounds!")
  // Ok, 'position' is safe.
}

How does the compiler know that i, which starts at some value "startIndex", can be incremented to eventually arrive at some other value, "endIndex", without overflow? It doesn't. But you can help it a lot if you write your code like this instead:

var i = startIndex
while i < endIndex { // '<', not '!='
  useElement(self[i])
  formIndex(after: &i)
}

Here's a simple example showing how this one change allows bounds-checks to be eliminated where they otherwise wouldn't be: Godbolt. Note that iterateOne (using a for-loop) has 4 traps, as does iterateTwo using the != endIndex approach. iterateThree brings this down to one (!) by using < endIndex. You can eliminate even that trap by using unsigned integers.

(iterateFour is just to show that Array is full of compiler magic. The idea that it's written in Swift is a half-truth at best; you can't write something like Array in a 3rd-party library as the language is today)

This is important, because traps prohibit all kinds of optimisations and are hugely detrimental to performance. The compiler is very, very cautious about eliminating them, and won't even hoist traps out of loops (!). IMO, part of achieving predictable, high performance without sacrificing safety is making sure that bounds-checks can be predictably eliminated.

In order to achieve the best performance, I've had to rewrite a lot of algorithms from the standard library (firstIndex, lastIndex, contains, etc) to use < or > rather than !=. That's bad. I think this should be investigated, and the standard library could do a lot better to help you write safe, performant code.

4. Scalar evolution is not good enough

I don't know that much about how the optimiser works, but I believe the part which tracks integer values is known as "scalar evolution", and IIUC, most of what we have in Swift comes from LLVM.

I've noticed a lot of cases where this just isn't good enough, and even code which is carefully written to avoids bounds-checking still ends up generating traps and not getting optimised as well as I expect it to. Here's one of many examples, in a function I rewrote from the standard library for better performance:

internal func fastLastIndex(where predicate: (Element) -> Bool) -> Index? {
  var i = endIndex
  while i > startIndex {
    formIndex(before: &i)
    if i < endIndex, predicate(self[i]) { return I }
    // ^^^^^^^^^^^^ - should not be necessary!
  }
  return nil
}

When run on a type which uses unsigned integers for indexes, that i < endIndex check should not be necessary. But it is - otherwise I get a trap, and my loop becomes a lot more expensive.
Here's another one (SE-15209):

func test(_ front: inout Int8, _ n: Int8) {
    if front > n, n >= 0 {
      front -= n  // cannot underflow - why doesn't the compiler know?
    }
}

In Swift, we rely on bounds-checking for safety, and we rely on those check being eliminated at compile-time for performance. Low-level code should not be omitting the checks - rather, it should be written in such a way that you can reason about its safety, with good assurances that the compiler will agree with you and thus be able to statically eliminate the checks. If we want to improve how predictably code will perform, I think we also need to improve this aspect of the optimiser so that it can handle more cases that a human can reason about.

Additionally, it might be worth adding a special kind of precondition for bounds-checking, so that (just like move), the compiler can emit special diagnostics if it can't statically prove an access is safe.