Simultaneous min & max

Thanks for the deep dive Steve, I appreciate learning this type of thing.

A bit off topic, but how does one programmatically discover the relevant cache size from Swift code?

It’s not exposed as a language feature; the mechanism to query it varies between systems: one might use sysctlbyname or the commpage on macOS, /sys on Linux, some other method on Windows, etc.

There’s also a rich family of cache oblivious algorithms that block in such a way that they achieve (asymptotically) optimal cache locality without needing to know the precise cache size(s).

For “simple” accumulation trees like this, you don’t need an exact fit anyway. L1 caches are generally in the range of 16-128KB, so a conservative choice like 4KB will basically always work out in practice.

(Note that I think that it could be worth exposing this sort of thing with a uniform interface, but that’s a better fit for the System package than for Swift or the standard library itself)

3 Likes

There were two more suggestions in the pull request. I added a cleaned-up version of the two-at-a-time loop. Then I added, behind a #conditional, a four-at-a-time loop.

I just removed the "??" elegantly when I had to stick them back in. An earlier post here suggested there's a diminishing-returns effect for multi-element reading, so four-at-a-time may not be worth it. Unless it helps by a big amount, I'm inclined to just remove it.

Terms of Service

Privacy Policy

Cookie Policy