A roadmap for improving Swift performance predictability: ARC improvements and ownership control

Saklad5 · December 22, 2021, 12:56am

Wouldn’t that be a use for inout? IteratorProtocol doesn’t require anything with parameters to begin with.

Saklad5 · December 22, 2021, 1:03am

I suspect optimizer effects will be formally established by the time Swift 7 rolls around. They seem pretty much nonnegotiable for extreme optimizations like compile-time evaluation or even automatic parallelization. Sendable already inches toward that space.

I agree: that seems like the sort of thing that you’d always want to use, except when you are reluctant to commit to that in your API. A bit like @inlinable.

Andrew_Trick · December 22, 2021, 1:06am

This is not part of the roadmap or any current proposal. But we do expect to order deinitialization with respect to "external function calls". And to me it only makes sense for that ordering to be bidirectional. So, a print statement in a deinitializer should be ordered relative to print statements before and after the end of the scope.

Karl · December 22, 2021, 1:10am

I'm not really sure that @nonescaping will be able to scale to any meaningful extent unless we go for something more like Rust's lifetime system, even with the extensions described here.

A simple example, LazyMapCollection:

struct LazyMapCollection<Source, Result> where Source ... {
  var source: Source
  var transform: (Source.Element) -> Result
}

As I understand it, even with these extensions, I won't be able to use a @nonescaping closure in this type. There's no way to propagate that the overall type becomes @nonescaping if the closure is.

Now, I'm using a closure here, but it could be a class, or refcounted COW storage, or a pointer to a stack buffer. The things described in this proposal are nice (don't get me wrong - these are some very welcome improvements, and I'm excited about them!)... but fundamentally, it is still quite limited. The demonstrations are quite basic, and I'm not sure how many issues it will solve in more complex, real-world code.

EDIT: Oh, and presumably you wouldn't be able to use this with a @nonescaping source collection either. So even if I could tag my function's argument as not escaping and have it stack-allocated, lots of important features wouldn't work with it -- because the compiler doesn't necessarily know that array.lazy.map { ... } won't escape the array.

Also, I'm not suggesting this is a trivial problem to solve. I only know of Rust that has managed it, and it comes with some severe usability costs.

hisekaldma · December 22, 2021, 8:36am

This is absolutely fantastic!

Are arguments to case constructors also consumed?

DevAndArtist · December 22, 2021, 9:28am

@Joe_Groff may I propose one possible alternative name to ref. If we want a keyword that is as short as let and var while it should enforce some special behavior, how about "use"?

use is somewhat already fairly close to borrow, so it seems to me that it could actually fit naturally.

Yet another alternative names could be "cut" or "own".

tclementdev · December 22, 2021, 10:09am

Could variables explicitly marked as consuming be moved automatically and require explicit copy? That could help remove a lot of move() cluttering.

Can @noImplicitCopy be used with property variables as well?

JaapWijnen · December 22, 2021, 2:04pm

What an awesome roadmap, very exciting!

Just for my understanding, doesn't using the consuming keyword mean that you won't need to copy the marked argument? And that you therefore don't need to use move within the scope of the function.

tclementdev · December 22, 2021, 2:35pm

I don't know if I have it right but this pitch doesn't seem to have consuming variables be moved automatically.

Saklad5 · December 22, 2021, 2:37pm

I don’t think anyone is ever going to be using move regularly. In fact, I think it should be moved inside a caseless enumeration with a name like Binding, á la MemoryLayout, so it is easier to understand when seen for the first time.

JaapWijnen · December 22, 2021, 2:40pm

Ah I thought that was the point of the consuming keyword. But I probably misunderstood.

tclementdev · December 22, 2021, 2:41pm

I wonder how things will turn out. I had to deal with a C++ library (written not by me) and move() calls were all other the place, it was a bit overwhelming.

Saklad5 · December 22, 2021, 2:51pm

One of the fundamental goals of Swift is making the default the best option whenever possible. It’s a bit declarative in that respect.

Tools like these fall into two categories: overruling the default in edge cases where it isn’t desired, and adopting stricter limitations to make APIs clearer and/or enable even more aggressive optimizations. If there is an obvious optimization to be made that has absolutely no tradeoff, I expect the compiler to do it for me.

JuneBash · December 22, 2021, 3:41pm

That's quite the claim. I saw the recent proposal and immediately thought of ways I might start using it.

It seems a number of concerns are being brought up about approachability of these constructs. With respect, I don't think it's all entirely warranted. Someone can correct me if I'm wrong, but my impression of this proposal is that it encapsulates things that would be used primarily in lower-level-ish code that is very concerned about performance, where Swift currently doesn't cut it. I would imagine that most (ie >50% of) developers and apps wouldn't need any of this. But it's important for library developers and those who end up seeing performance issues that Swift can't reasonably get around.

Karl · December 22, 2021, 3:46pm

A couple of things I'd like to add, on further reflection:

One of the goals I had when making WebURL (which I don't think I've mentioned before) was that it should be 100% Swift. No C shims at all. I also want it to be the fastest implementation that exists anywhere, including being faster than Chromium/WebKit (which are written in C++). Lofty goals, for sure, but I wanted to see if it was even possible in Swift, and where the language might need to improve to make it possible.

And to that end, I have encountered a couple of areas where it does feel like I'm fighting the language/compiler, and I think they are also worth considering as part of this effort. Personally, ARC is very low on the list of things I struggle with, compared to any of these issues:

1. Implicit initialization of optionals.

This has been discussed a couple of times on the forums, (e.g. What is a ‘variable initialization expression’?). Basically, for stored properties, writing var x: Foo? versus var x: Optional<Foo> can have a huge performance impact.

Since these efforts are aimed at improving performance predictability, I think this should also be in scope.

2. Excessive calls to "outlined init with take".

Sometimes, loading a stored property which is an optional can be... just incredibly expensive. See this forum post for more information and Godbolt examples: Expensive calls to “outlined init with take”. I included some benchmark results at the time which showed a 17% (!) performance improvement on an already-optimised library by writing my own getters using MemoryLayout APIs.

I think this is more of a bug than a problem with the language, but again - performance predictability. It would be nice if I could use getters again.

3. Bounds-checking.

One of the other goals is that the library should be memory-safe. We've been a fuzzy about this in the past, just claiming that things should be 'safe by default', but I consider it a key feature for any library written in Swift. One of the reasons memory safety is such a big deal these days is because programmers are not very good at manually proving that memory accesses are safe - really, every access should be checked, and we should be relying on static verification by compiler rather than manual inspection to eliminate them.

There are a number of techniques that can help with that. One is to use buffers with unsigned integers as indexes (implementation and detailed notes here). I would love to see something like that in the standard library.

Another technique is to change how you write collection algorithms. One of the things I see quite a lot in Swift code (including the standard library) is this:

var i = startIndex
while i != endIndex {
  useElement(self[i])
  formIndex(after: &i)
}

This is really bad for eliminating bounds-checking. To the compiler, startIndex and endIndex are just some integers (or opaque values, almost always wrapping integers). It has no idea how they relate. But the checks in your subscript do have very specific expectations, such as:

subscript(position: Int) -> Element {
  precondition(position >= startIndex && position < endIndex, "Index out of bounds!")
  // Ok, 'position' is safe.
}

How does the compiler know that i, which starts at some value "startIndex", can be incremented to eventually arrive at some other value, "endIndex", without overflow? It doesn't. But you can help it a lot if you write your code like this instead:

var i = startIndex
while i < endIndex { // '<', not '!='
  useElement(self[i])
  formIndex(after: &i)
}

Here's a simple example showing how this one change allows bounds-checks to be eliminated where they otherwise wouldn't be: Godbolt. Note that iterateOne (using a for-loop) has 4 traps, as does iterateTwo using the != endIndex approach. iterateThree brings this down to one (!) by using < endIndex. You can eliminate even that trap by using unsigned integers.

(iterateFour is just to show that Array is full of compiler magic. The idea that it's written in Swift is a half-truth at best; you can't write something like Array in a 3rd-party library as the language is today)

This is important, because traps prohibit all kinds of optimisations and are hugely detrimental to performance. The compiler is very, very cautious about eliminating them, and won't even hoist traps out of loops (!). IMO, part of achieving predictable, high performance without sacrificing safety is making sure that bounds-checks can be predictably eliminated.

In order to achieve the best performance, I've had to rewrite a lot of algorithms from the standard library (firstIndex, lastIndex, contains, etc) to use < or > rather than !=. That's bad. I think this should be investigated, and the standard library could do a lot better to help you write safe, performant code.

4. Scalar evolution is not good enough

I don't know that much about how the optimiser works, but I believe the part which tracks integer values is known as "scalar evolution", and IIUC, most of what we have in Swift comes from LLVM.

I've noticed a lot of cases where this just isn't good enough, and even code which is carefully written to avoids bounds-checking still ends up generating traps and not getting optimised as well as I expect it to. Here's one of many examples, in a function I rewrote from the standard library for better performance:

internal func fastLastIndex(where predicate: (Element) -> Bool) -> Index? {
  var i = endIndex
  while i > startIndex {
    formIndex(before: &i)
    if i < endIndex, predicate(self[i]) { return I }
    // ^^^^^^^^^^^^ - should not be necessary!
  }
  return nil
}

When run on a type which uses unsigned integers for indexes, that i < endIndex check should not be necessary. But it is - otherwise I get a trap, and my loop becomes a lot more expensive.
Here's another one (SE-15209):

func test(_ front: inout Int8, _ n: Int8) {
    if front > n, n >= 0 {
      front -= n  // cannot underflow - why doesn't the compiler know?
    }
}

In Swift, we rely on bounds-checking for safety, and we rely on those check being eliminated at compile-time for performance. Low-level code should not be omitting the checks - rather, it should be written in such a way that you can reason about its safety, with good assurances that the compiler will agree with you and thus be able to statically eliminate the checks. If we want to improve how predictably code will perform, I think we also need to improve this aspect of the optimiser so that it can handle more cases that a human can reason about.

Additionally, it might be worth adding a special kind of precondition for bounds-checking, so that (just like move), the compiler can emit special diagnostics if it can't statically prove an access is safe.

Saklad5 · December 22, 2021, 3:53pm

My problem with these constructs is that functions like move and copy are top-level, unconstrained, and ambiguously named outside this context. That’s a bad combination, especially in the standard library. It is precisely because they are very situational that their meanings need to be extremely clear.

Saklad5 · December 22, 2021, 4:01pm

I avoid raw loops wherever possible in favor of lazy composition. That way, optimizations can be made as generic as possible, thereby allowing them to be used wherever applicable. The more optimizations you do at the point of use, the harder it is to read said point of use.

I think you hit the nail on the head: the focus of performant code should be recognizing optimization opportunities that aren’t obvious. The most common example, in my experience, is skipping checks that have already been performed.

Could you submit a pull request to the standard library with those implementations? I’ll do it if you aren’t planning to.

Karl · December 22, 2021, 4:14pm

I'm not. I've found that contributing to the standard library is incredibly painful, so I don't bother any more.

For example: eliminating a trap in UMBP.swapAt. Pretty trivial, with a detailed example showing how it prevents effective loop unrolling. IMO it should have been accepted just on principle.

But it gets met with quite a lot of resistance by one maintainer, who demands I submit large portions of my library to the benchmark suite, and it drags on for months. I'd like all Swift users to benefit from what I found (that's why I made the PR), but if that's going to be the process for even a little fix like this, I'd rather just fix it in my library and move on.

Saklad5 · December 22, 2021, 4:27pm

Well, I’ve got time, so I’m going to take this as permission to do so.

allevato · December 22, 2021, 4:41pm

While compile-time constant values may help for rendering more complex data types into static data, it seems like there's still a lot of progress that could be made in this area without that feature. After all, this is an ability that the C language has had for decades, before compile-time evaluated expressions were a thing in C++.

The inability of Swift to express simple arrays of numeric or text literals as data without incurring a runtime initialization penalty to allocate and populate the memory or relying on the ObjectOutliner pass is a frustrating point of contention for performance-sensitive code. (In swift-protobuf, we'd like to explore table-ifying some of our code generation, but these issues make us worry that performance would be wholly unpredictable.) Unless compile-time constants are in the very near future, I would hope we can make some additional progress in that space without them as well, and then let compile-time constants build on that.