Functions that accept arrays or array slices

Karl · May 1, 2022, 3:27pm

Generic code is quite nice for performance tuning, because it allows you to focus on writing your algorithm at an abstract level against an opaque interface (say, "some RandomAccessCollection of UInts), and you can plug-in different implementations.

There is nothing about pointers that magically makes them fast. Pointers are just addresses in memory, and of course everything in your process' memory has an address. For example, Array uses pointers behind-the-scenes, but it layers validation on top to ensure memory safety, with the goal being that the optimiser will remove any redundant validation and it ends up being as fast as an unchecked pointer. There are two things it adds:

Temporal/lifetime safety through ARC. This can be an overhead, but the compiler is becoming much, much smarter about optimising it through "ownership" and other performance predictability features.
Spatial safety through bounds-checking. Ultimately this relies on the compiler's ability to reason about index values and how they evolve. Personally, I've found that Swift's use of signed integers can be very limiting here, and I've had great success addressing memory buffers using unsigned integers instead.

My own results show that the compiler is so much better at reasoning about unsigned integers, we can get much more predictable bounds-checking elimination this way. Moreover, the compiler can better understand the access pattern, allowing optimisations like automatic vectorisation.

This shouldn't be very surprising - proper collection algorithms obviously include their own bounds-checking (they don't rely on trapping at runtime!); we just need to find a way for the compiler to understand it better. I don't know if that means adding unsigned-indexed buffers to the standard library, or improving the optimiser (maybe both?).

Other than those two things, it is literally a pointer. We rely on the optimiser to remove them (and it is improving) but there are also things you can experiment with. It's an ongoing process to make those optimisations more accessible and predictable without jumping through special hoops; obviously we want idiomatic code to also be fast. So as far as portable code goes, I do actually think you can get at/close to C levels of performance - and you also do better, getting that kind of performance while also getting checked accesses to memory, which is a huge deal!

I should note that all of this is enabled by generics. The way I've been able to experiment with things like unsigned indexes at all is because my algorithms were all generic and didn't care what the actual index type was, so I could try them with an Array, and then with a wrapper which gives the Array unsigned indexes. That's why it's good to select your generic constraints carefully. Do you really need integer indexes? Remember that RandomAccessCollection gives you O(1) index construction from an integer offset anyway.

IMO, this is a really pleasing way to write code and optimise algorithms. It cleanly separates improvements to the generic algorithm itself from improvements to the data structure and how accesses are communicated to the compiler (and which validation we need it to perform for us).

As for non-portable code (processor-specific instructions), IIRC that is also something we want to add in some shape eventually. I don't think there are imminent plans for it, though. @scanon would know more.