A roadmap for improving Swift performance predictability: ARC improvements and ownership control

I don’t think anyone is ever going to be using move regularly. In fact, I think it should be moved inside a caseless enumeration with a name like Binding, á la MemoryLayout, so it is easier to understand when seen for the first time.

1 Like

Ah I thought that was the point of the consuming keyword. But I probably misunderstood.

I wonder how things will turn out. I had to deal with a C++ library (written not by me) and move() calls were all other the place, it was a bit overwhelming.

One of the fundamental goals of Swift is making the default the best option whenever possible. It’s a bit declarative in that respect.

Tools like these fall into two categories: overruling the default in edge cases where it isn’t desired, and adopting stricter limitations to make APIs clearer and/or enable even more aggressive optimizations. If there is an obvious optimization to be made that has absolutely no tradeoff, I expect the compiler to do it for me.

1 Like

That's quite the claim. I saw the recent proposal and immediately thought of ways I might start using it. :sweat_smile:

It seems a number of concerns are being brought up about approachability of these constructs. With respect, I don't think it's all entirely warranted. Someone can correct me if I'm wrong, but my impression of this proposal is that it encapsulates things that would be used primarily in lower-level-ish code that is very concerned about performance, where Swift currently doesn't cut it. I would imagine that most (ie >50% of) developers and apps wouldn't need any of this. But it's important for library developers and those who end up seeing performance issues that Swift can't reasonably get around.

3 Likes

A couple of things I'd like to add, on further reflection:

One of the goals I had when making WebURL (which I don't think I've mentioned before) was that it should be 100% Swift. No C shims at all. I also want it to be the fastest implementation that exists anywhere, including being faster than Chromium/WebKit (which are written in C++). Lofty goals, for sure, but I wanted to see if it was even possible in Swift, and where the language might need to improve to make it possible.

And to that end, I have encountered a couple of areas where it does feel like I'm fighting the language/compiler, and I think they are also worth considering as part of this effort. Personally, ARC is very low on the list of things I struggle with, compared to any of these issues:

1. Implicit initialization of optionals.

This has been discussed a couple of times on the forums, (e.g. What is a ‘variable initialization expression’?). Basically, for stored properties, writing var x: Foo? versus var x: Optional<Foo> can have a huge performance impact.

Since these efforts are aimed at improving performance predictability, I think this should also be in scope.

2. Excessive calls to "outlined init with take".

Sometimes, loading a stored property which is an optional can be... just incredibly expensive. See this forum post for more information and Godbolt examples: Expensive calls to “outlined init with take”. I included some benchmark results at the time which showed a 17% (!) performance improvement on an already-optimised library by writing my own getters using MemoryLayout APIs.

I think this is more of a bug than a problem with the language, but again - performance predictability. It would be nice if I could use getters again.

3. Bounds-checking.

One of the other goals is that the library should be memory-safe. We've been a fuzzy about this in the past, just claiming that things should be 'safe by default', but I consider it a key feature for any library written in Swift. One of the reasons memory safety is such a big deal these days is because programmers are not very good at manually proving that memory accesses are safe - really, every access should be checked, and we should be relying on static verification by compiler rather than manual inspection to eliminate them.

There are a number of techniques that can help with that. One is to use buffers with unsigned integers as indexes (implementation and detailed notes here). I would love to see something like that in the standard library.

Another technique is to change how you write collection algorithms. One of the things I see quite a lot in Swift code (including the standard library) is this:

var i = startIndex
while i != endIndex {
  useElement(self[i])
  formIndex(after: &i)
}

This is really bad for eliminating bounds-checking. To the compiler, startIndex and endIndex are just some integers (or opaque values, almost always wrapping integers). It has no idea how they relate. But the checks in your subscript do have very specific expectations, such as:

subscript(position: Int) -> Element {
  precondition(position >= startIndex && position < endIndex, "Index out of bounds!")
  // Ok, 'position' is safe.
}

How does the compiler know that i, which starts at some value "startIndex", can be incremented to eventually arrive at some other value, "endIndex", without overflow? It doesn't. But you can help it a lot if you write your code like this instead:

var i = startIndex
while i < endIndex { // '<', not '!='
  useElement(self[i])
  formIndex(after: &i)
}

Here's a simple example showing how this one change allows bounds-checks to be eliminated where they otherwise wouldn't be: Godbolt. Note that iterateOne (using a for-loop) has 4 traps, as does iterateTwo using the != endIndex approach. iterateThree brings this down to one (!) by using < endIndex. You can eliminate even that trap by using unsigned integers.

(iterateFour is just to show that Array is full of compiler magic. The idea that it's written in Swift is a half-truth at best; you can't write something like Array in a 3rd-party library as the language is today)

This is important, because traps prohibit all kinds of optimisations and are hugely detrimental to performance. The compiler is very, very cautious about eliminating them, and won't even hoist traps out of loops (!). IMO, part of achieving predictable, high performance without sacrificing safety is making sure that bounds-checks can be predictably eliminated.

In order to achieve the best performance, I've had to rewrite a lot of algorithms from the standard library (firstIndex, lastIndex, contains, etc) to use < or > rather than !=. That's bad. I think this should be investigated, and the standard library could do a lot better to help you write safe, performant code.

4. Scalar evolution is not good enough

I don't know that much about how the optimiser works, but I believe the part which tracks integer values is known as "scalar evolution", and IIUC, most of what we have in Swift comes from LLVM.

I've noticed a lot of cases where this just isn't good enough, and even code which is carefully written to avoids bounds-checking still ends up generating traps and not getting optimised as well as I expect it to. Here's one of many examples, in a function I rewrote from the standard library for better performance:

internal func fastLastIndex(where predicate: (Element) -> Bool) -> Index? {
  var i = endIndex
  while i > startIndex {
    formIndex(before: &i)
    if i < endIndex, predicate(self[i]) { return I }
    // ^^^^^^^^^^^^ - should not be necessary!
  }
  return nil
}

When run on a type which uses unsigned integers for indexes, that i < endIndex check should not be necessary. But it is - otherwise I get a trap, and my loop becomes a lot more expensive.
Here's another one (SE-15209):

func test(_ front: inout Int8, _ n: Int8) {
    if front > n, n >= 0 {
      front -= n  // cannot underflow - why doesn't the compiler know?
    }
}

In Swift, we rely on bounds-checking for safety, and we rely on those check being eliminated at compile-time for performance. Low-level code should not be omitting the checks - rather, it should be written in such a way that you can reason about its safety, with good assurances that the compiler will agree with you and thus be able to statically eliminate the checks. If we want to improve how predictably code will perform, I think we also need to improve this aspect of the optimiser so that it can handle more cases that a human can reason about.

Additionally, it might be worth adding a special kind of precondition for bounds-checking, so that (just like move), the compiler can emit special diagnostics if it can't statically prove an access is safe.

21 Likes

My problem with these constructs is that functions like move and copy are top-level, unconstrained, and ambiguously named outside this context. That’s a bad combination, especially in the standard library. It is precisely because they are very situational that their meanings need to be extremely clear.

3 Likes

I avoid raw loops wherever possible in favor of lazy composition. That way, optimizations can be made as generic as possible, thereby allowing them to be used wherever applicable. The more optimizations you do at the point of use, the harder it is to read said point of use.

I think you hit the nail on the head: the focus of performant code should be recognizing optimization opportunities that aren’t obvious. The most common example, in my experience, is skipping checks that have already been performed.

Could you submit a pull request to the standard library with those implementations? I’ll do it if you aren’t planning to.

1 Like

I'm not. I've found that contributing to the standard library is incredibly painful, so I don't bother any more.

For example: eliminating a trap in UMBP.swapAt. Pretty trivial, with a detailed example showing how it prevents effective loop unrolling. IMO it should have been accepted just on principle.

But it gets met with quite a lot of resistance by one maintainer, who demands I submit large portions of my library to the benchmark suite, and it drags on for months. I'd like all Swift users to benefit from what I found (that's why I made the PR), but if that's going to be the process for even a little fix like this, I'd rather just fix it in my library and move on.

6 Likes

Well, I’ve got time, so I’m going to take this as permission to do so.

5 Likes

While compile-time constant values may help for rendering more complex data types into static data, it seems like there's still a lot of progress that could be made in this area without that feature. After all, this is an ability that the C language has had for decades, before compile-time evaluated expressions were a thing in C++.

The inability of Swift to express simple arrays of numeric or text literals as data without incurring a runtime initialization penalty to allocate and populate the memory or relying on the ObjectOutliner pass is a frustrating point of contention for performance-sensitive code. (In swift-protobuf, we'd like to explore table-ifying some of our code generation, but these issues make us worry that performance would be wholly unpredictable.) Unless compile-time constants are in the very near future, I would hope we can make some additional progress in that space without them as well, and then let compile-time constants build on that.

5 Likes

Swift has been very deliberate in avoiding unsigned integers in API. This clearly has the unfortunate side effect of introducing unnecessary checks for negative values in code that will never see them. Does the compiler (or LLVM) have the ability to track statically-proven bounds? Surfacing that ability in the language (e.g. var startIndex: @nonnegative Int) might get the best of both worlds: the ergonomics of signed indexes with the performance characteristics of unisgned codegen.

I believe that avoiding unsigned integers has been a serious mistake, personally. I’d rather we just fixed that in a future major release.

3 Likes

Sure, but for specialised types like UnsafeBoundsCheckedBufferPointer, it isn't an issue. It is only used as a generic collection, so no callers even care what the index type is.

I've benchmarked it extensively (it is used everywhere in WebURL; we almost never use the standard library's UnsafeBufferPointer), and it does not perform any worse, despite adding bounds checking. That's a result of how it is designed and very careful coding in custom collection algorithms to ensure the compiler can reason about the checks.

I'd like all developers to benefit from that, so I think it is worth considering adding it (or something like it) to the standard library as an alternative to UBP.

Anyway, we probably shouldn’t get too far off track in this thread.

There’s currently significant effort being put into UnsafePointer and friends. If the standard library can add bounds checking to the existing types without impacting performance, why introduce another type, forcing clients to decide which one to use and convert between them when someone else has concluded differently?

I agree that the idea of overhauling Swift’s standard library to use unsigned integers is likely out of scope for this discussion, but I am interested to know whether it’s feasible to add attributes (like @nonnegative) that hook up to LLVM features which are currently unexploited. If, as @Karl argues, superfluous bounds checks are such a critical bottleneck, then it might be worth factoring into the roadmap.

Or heck, maybe an attribute isn’t necessary, and hooking into such functionality would silently benefit all non-resilient and @usableFromInline code.

Are these checks equivalent?

    let x: Int = 1
    let check1 = x >= 0 && x <= 100
    let check2 = UInt(bitPattern: x) <= 100

If they are then signed index bounds checking could be as fast as unsigned, no?

Yes but that only lets you eliminate the trap for the lower bound if you know the lower bound is zero. That’s true for Array, but not for ArraySlice, yet both have Int indexes.

Wouldn't the relevant check for array slice be i >= a && i <= b regardless of whether indices are signed or unsigned?