Letting the compiler figure access patterns & choose the data structure that best fits

mnvr · January 7, 2024, 5:14am

So I was writing a program where I need to prepend items. I know that an array is not the suitable data structure for this use case, and I should reach for a deque. Fine, but I was wondering to myself if there can be a "smarter" array that handles prepends. There are two ways to go about it:

Use a data structure that implements everything say O(log n), so no particular operation is slow, but nothing is fast either.
Add some runtime checks in the code for, say, array to detect (at runtime) if a common but inefficient access pattern is being used and special case itself for that.

Both these approaches have their issues, which is why standard libraries (Swift or elsewhere) don't try these tricks, and just let the user (programmer) explicitly specify what they want.

Then it occured to me, a rather simple observation - the compiler knows enough to make the choice!

This decision tree can become quite complicated if one wants to handle all sorts of cases (just like with other compiler optimizations), but at its core, it is a choice between 3 data structures:

Arrays (usually, and also the default if the compiler can't deduce any predominant access pattern)
Set (if the compiler detects the most common operation is checking for element existence)
Deque (if the compiler detects frequent prepends)

Of course, the programmer still can tell the compiler what they want explicitly, but for folks who don't care, they can say just give me this (for lack of a better name) "List" and figure out for me please which underlying data structure that best matches the access patterns of the code that I'm about to write using it.

Since the thought is so simple, other people would've also had had this thought previously, but I don't recall having encountered a language which does this for me. I can see two reasons why: (a) doing this compiler based selection might get hairy real soon in general cases, and (b) the world works fine without this too.

I don't know enough to say about (a), but if (b) is the only reason then I'd push a bit more. The way things are is fine for me and other existing programmers, this isn't an issue in practice, but the situation could be improved for future programmers. I have learnt about arrays and sets and dequeues etc (and am quite happy to have learnt about them), but should it be necessary for all folks to learn about them to be able to program? No, right? They can, if they want to, but if the language can abstract this choice for them efficiently, it is one less roadblock for newcomers.

I wasn't sure where to post this, I hope adding the "discussion" tag is enough to indicate that I'm just trying to put this thought out there, not necessarily expecting this to translate into a pitch. Since what I'm proposing is a compile time choice (and not a runtime choice), it cannot be done at the Collections library level, which is why I didn't post it in that category.

JohnBlackburne · January 7, 2024, 7:50am

I can't see how a compiler could decide this for you, as it can't tell how much any particular data structure will be used. It could try and count references to the structure and to particular interfaces (such as prepend). But those references/uses might exist in a little used block of code. In fact if a programmer knows that doing it a particular way is expensive they're probably going to limit those use cases to code that's called rarely, and use them more often in such code.

The only way to do it would be as part of testing, where it identifies which code is called most often and so where data structure choices are non-optimal. But that's available already; you can profile code and find where it's spending most of its time, including (but not limited to) accessing data in an inefficient way.

StefanCosminR · January 7, 2024, 9:10am

The compiler can't really know the frequency of these accesses, here are two examples to better illustrate this.

Example 1. iOS Apps
Let's say you have an app with an infinite scroll backed by this "magic List". On the main screen users can see a list of posts. On another view, users can pin posts (that would be a prepend). Let's also say they can prepend a custom message up there.

The compiler sees only 1 access to .append() on the main feed but 2 accesses of .prepend() in the secondary screen because there are two types of things you can pin. Should the compiler optimise for the prepend scenario here? That would be bad because that's a feature that would be used far less. You can also imagine you limit this feature only to premium users, but the compiler can't know that use case will be unavailable most of the time to most people.

There's also the problem that every user will get the same compiled app, so the compiler couldn't really make these decisions that would impact everyone. You as a developer can look at analytics and profile your app and make an educated guess, but the compiler doesn't have access to any of that.

Example 2. Swift on the server
Very similar with the first example but maybe even simpler to visualise. Let's say you have two endpoints, one that appends to a memory list and one that prepends to it. How would the compiler know how often each endpoint is called?

mnvr · January 7, 2024, 9:20am

I get what both of you are saying here, and I indeed don't know myself exactly how this'll get implemented. Maybe the techniques to determine these are a research project and not something that can be done in a practical setting today.

But I don't see why it is fundamentally impossible - this is the same set of issues that the compiler already solves when optimizing code. When I studied compiler design, a lot of the optimizations in the advanced classes went over my head because they used mathematics I didn't understand. But they work! In our lifetime, we've seen compilers go from strength to strength. LLVM, or the Swift complier, doesn't use runtime profiling information when doing optimized builds, there is just so much magical seeming stuff it can optimize just by static analysis. There are even bigger examples like GHC - the Haskell compiler - which magically (it seems) transforms very abstract high level theorems into code which runs (almost) as fast as hand written code.

And all this is before we even get to JIT based compilers like V8.

The point I'm trying to make is – I get this is hard, and maybe it is hard enough that spending time doing it isn't worth the effort. Agreed. But what I don't agree with is considering this cannot be done. Compliers can do a lot with the static code analysis that they already do.

Pippin · January 7, 2024, 9:58am

The jump of concern of memory layout to "dynamically choose an abstract data structure" seems very odd and incorrect. Additionally, which implementation should it choose? There are many per abstract data structure. One could make arrays "not bad with prepends" depending on the implementation.

I don't necessarily see this as a static optimization problem, but a meta analysis problem and how a program interacts with/uses data.

This is a very theoretical-heavy question and seems more suited for something like the Computer Science StackExchange and not a specific language implementation. Not that I don't think this discussion should be had here, just that you may get a lot more ... thorough answers from a more straightforward bunch. Definitely a question to be asked though, but imo should be generalized to something like: "Could a compiler/static analysis infer the best abstract data type?".

mnvr · January 7, 2024, 10:17am

Alright, I went ahead and asked it there!

ibex10 · January 7, 2024, 11:25am

I quite like the question.

However, the implications would be enormous if compilers were able to achieve this.

tera · January 7, 2024, 12:06pm

See one-terabyte-array-idea for arrays that could grow & shrink from both ends in O(1) time.

Nevin · January 7, 2024, 4:20pm

Not exactly the same, but NSArray apparently has tree-like performance at large counts.

(Or at least did back in, uh, 2005 when that post was written.)

mnvr · January 7, 2024, 4:34pm

Thank you for sharing that, I'm getting the right sort of nostalgia reading a ridiculousfish blog post again.

The bit I found interesting, for people who won't click the link:

So it sure looks like CFArray is switching data structure implementations around 30,000 elements. And I believe there's lots more implementations that I haven't discovered, for smaller arrays and for immutable arrays.

And his conclusions:

...NSArrays are good at some things that arrays are traditionally bad at. Don't go rolling your own dequeue just because you plan to insert a lot of stuff in the beginning of an array. ...

Apple often names a class in terms of what it does, not how it does it. This is the spirit of Objective-C dynamic messaging: a message says what to do, but not how to do it. The way it does it may be different from what you expect. Which leads me to:

Don't second guess Apple, because Apple has already second guessed YOU.

(I miss Objective-C often. I like Swift too, but objc was something special...)

wadetregaskis · January 7, 2024, 6:25pm

I'm surprised at the pessimism on display here, by some commenters. While this is indeed not our current reality, as @mnvr notes this is not far beyond what compilers are already capable of. I'd go so far as to say it's on their current trajectory.

There are related optimisation techniques that are already implemented today in some compilers & situations, like transforming arrays of structs into structs of arrays (this technique is an extremely powerful one for performance - it's one of the main ways the Carbon compiler frontend increased its performance by an order of magnitude over Clang w/ C++ [1]). These are not hypothetical techniques in the HPC world, for example, but rather essential.

I think Swift already has a lot of the foundation for this. Protocols are a pivotal abstraction, for example, for "here's what I do, not how I do it". And there's obvious avenues to explore on both the automated and manually guided directions - e.g. PGO, and/or there could be some way to specify "implementation bundles" as a kind of sum type, like typealias OrderedList = Array | Deque to clearly define the compiler's search parameters for implementations.

There's even the possibility of deferring to runtime if the compiler is indecisive (although this will penalise other optimisations… still, might be worth it). Just like Swift already has plenty of 'escape hatches' to runtime for the compiler when it's too hard - or impossible - to determine the optimal choice statically.

It should also be recognised that a degree of this is already possible today - and well-precedented - at runtime through manual labour. There's nothing stopping someone writing an OrderedList in Swift right now which dynamically adjusts its storage based on access patterns. That approach cannot achieve as good results as a compile-time choice, but it has proven to still be quite effective in some cases, which proves the case for this feature.

Nevin · January 7, 2024, 6:32pm

It might be interesting to consider the possibility of a compiler making suggestions, as distinct from errors and warnings.

“Hi, it looks like you’re trying to prepend to an array. Would you like to switch to a deque?” — Clippy, probably

taylorswift · January 7, 2024, 11:17pm

for what it’s worth, there is a tremendous amount of well-documented low-hanging fruit (example) that the swift compiler is failing to optimize today, optimizations that can have a dramatic impact on performance. one thing that would have a significant impact on the ecosystem today is if it were possible to declare _modify accessor requirements in protocols, as the existing get/set semantics severely limit the possibilities for composable data structures today.

sspringer · January 8, 2024, 9:26am

I fear even longer compile times… And be careful not to accidentally solve the halting problem … But (sorry for the slight sarcasm in the previous sentence) I think your idea is a very valid one if implemented as a separate tool that can give according hints and maybe offers some refactorings, although I am not sure how well this could work, maybe this is a sensible AI application.

wadetregaskis · January 8, 2024, 5:31pm

Definitely possible, but let's be wary about jumping to conclusions. I suspect in some cases the heuristic could be not much more than "are any prepend methods used" (for array vs deque), for example. Remember that the compiler doesn't have to be perfect, as with any optimisation it does, merely better than random chance (strictly speaking) and [fairly] consistent.

I'm assuming, though, that this auto-selection would be opt-in (e.g. explicitly defining a sum type of candidate collection types, and using that type name). Automatically replacing today's Array with Deque would probably be a bad idea, as there's almost certainly going to be a need to easily override any compiler guesses in some cases, and simply using the type system as intended (i.e. being more specific about the collection type you want) seems like the most elegant way to facilitate that.

beccadax · January 9, 2024, 7:52pm

An NSArray-like approach, where there are several implementations and the standard library chooses one at runtime (and can change its choice more or less at will), is theoretically possible. However, Swift.Array inlines code that assumes you can access elements by doing pointer arithmetic, which means it'd likely be ABI-breaking to change that.

But having the compiler choose an implementation at compile-time seems very difficult except in very limited cases. Basically, if your array ever gets passed outside your own function, the compiler would need to look inside those other functions to see what operations they perform (recursively), and if that led it to decide it should use a deque, create deque-based versions of that function. This kind of inter-function analysis is difficult even in the best circumstances and often (when you're calling functions in libraries, for instance, which means their bodies aren't available to the compiler) basically impossible.

By contrast, in the current manual situation, functions communicate the capabilities they need explicitly by using the types in their function signatures. If a library function needs to prepend to the array passed to it, that function will take a Deque instead of an Array, and so you will know that your code should be adjusted to use a deque. It requires a little more work and knowledge on the programmer's part, but it works reliably under any circumstance.

wadetregaskis · January 9, 2024, 10:24pm

Yes, but that merely means the technique is incompatible with that subset of cases. Most Swift libraries are statically linked and built from source - or at least, that's the case for a significant portion of Swift's users - and, the details of any particular build system aside, in principle global optimisations are possible.

Plus, maybe optimisations like interface inlining aren't actually the right choice, if you can only have one or the other? Subscript function call overhead, for example, perhaps pales in comparison to the cost of prepending to a non-trivially-sized array (in the traditional C sense).

There's hybrid approaches, too - e.g. still permit inlining of methods but include a runtime check of the underlying type with fallbacks to non-inlined versions if the type doesn't match the one(s) with inlined support. Quite like how tracing JITs work, in a way. The cost of a compare and branch that's consistent on a given address is very small, I suspect, because it's easily predicted.

I agree it can be difficult, but I wouldn't go so far as to assert it's often impossible. There's already some surprisingly capable optimisations in the Swift compiler, that can transcend a surprising number of function calls if not module boundaries. Lazy collections rely on them to have good performance, for example.

You're right to highlight that these sorts of optimisations are best-effort, essentially. That has implications on their potential language integration, such as whether a fallback option must be explicit.

P.S. This is all armchair compiler design, of course. None of this stuff is simple or easy, and we're glossing over a lot of details. But all good things start somewhere, even from naive or outlandish ideas, so let's not be too hasty to crush any dreams.

scanon · January 9, 2024, 10:30pm

Doing any sort of access-pattern analysis at compile time is pretty fraught; you really need runtime trace data to do these sorts of optimizations effectively, otherwise you'll end up making changes that turn out to not be supported by actual access patterns. Doing it based on profile data--whether automatically or by generating hints for the programmer from traces--is likely to be pretty profitable, however.