RFC: Deque: should it have an opaque index?

Karl · April 26, 2021, 4:32pm

Just one more thing that I really would like to get off my chest: I've noticed all kinds of performance pitfalls in the standard library, from force-unwraps in UnsafeMutableBufferPointer.swapAt which prevent effective loop unrolling, to traps in Slice.withContiguousStorageIfAvailable, to the fact that basically everything in UnsafeRawBufferPointer uses force-unwraps (even .count!), which can prevent inlining -- it's actually almost aways better to pass an UnsafeRawPointer + separate 'count' parameter!, to the aforementioned issue initializing from slices.

I've had a lot of resistance to every one of these changes, with demands that I actually think are kind of unreasonable (e.g. having to add a benchmark which shows an improvement in order to get the force-unwrap from swapAt removed -- even though I am able to show a clear improvement in generated assembly, in real code which I took the time to extract from my project). It's been an enormous hassle chasing people down to beg them to run benchmarks, and I'm sick of it. It's not my full-time job.

It's not really the focus of this thread, so I'd prefer not to dwell on it, other than to say that I have essentially no motivation to help improve the performance of the standard library any more. I think low-level performance is going to stay poor until some Apple employee develops an interest in improving it, because only they can cut through the bureaucracy.

lorentey · April 27, 2021, 9:14am

Well, all I can tell you is that (as I’m sure you are well aware), I am very much affected by unpredictable/unstable low-level performance results, and I am just as frustrated as y’all are. All I’m saying is that these problems rarely if ever register in a typical Swift codebase as of today. (Sadly, low-level systems programming stuff is not yet a typical application of Swift; not even close. The current status quo is that the serious systems programming work is largely done in C or C++, even within the core Swift project. However, I really don’t think this will remain the case forever. And I’m looking forward to using these passionate posts to argue for prioritizing work.)

I am really sad to learn that NIO is not a fan of generic programming. To me, this sounds like a profoundly unproductive way to program, even when targeting a single type. (Which is, IIUC, a generic type anyway.) It may make sense for the internal details of a high-performance systems library, but it’s still a shame.

The Swift Algorithms and Collections packages exist to demonstrate the power of Swift’s Collection hierarchy and the viability of generic programming in all programming contexts, including low-level systems programming. As frustrated as I am with the unpredictable nature of advanced optimization work in Swift (including as it relates to implementing the high-performance implementations that Collections is supposed to provide), it is my absolute conviction that this is the right way to solve problems in Swift.

None of this latest round of exchanges has done anything to move my opinion on whether Deque has the right design for its indices. Integer offsets are far easier to explain/use/understand, more flexible and harder to accidentally misuse than any opaque index type. Opaque indices are often unavoidable, and there will be plenty of data structures that require using them — but forcing them on data structures that don’t would be a great disservice to our users.

This is a fascinating use case! Given that writev deals with piecewise contiguous byte buffers anyway, my instinct is to design the lowest-layer wrapper interface in terms of the highest common denominator of all such collections, which is UnsafeBufferPointer. (Modeling this really nicely rubs up against our current lack of a sensible bucket-o-bytes type that can safely/efficiently take ownership of a contiguous storage piece. But there are so many options to work around that!) All else being equal, I think I’d prefer to slice the unsafe buffer pointers, rather than the high-level collection.

So every fiber of my being wants to solve this problem in two layers:

@inlinable
func writev<C: PiecewiseContiguousCollection>(_ bytes: C) throws 
where C.Element == UInt8

// calls:

// opaque, non-generic
func writev(_ buffers: RawBuffers) throws

The problem of course lies in designing the argument types, especially if this was supposed to be part of a general-purpose syscall layer like Swift System. (I am extremely interested in solving this problem, and I have concrete ideas, too, but I’m not going to develop them in a forum post at 2am. )

(For a one off use case like CircularBuffer, I think I’d still go with this same setup, except the high-level layer wouldn’t need to be generic — but it would still simply extract the contiguous pieces (UBPs or whatever) then call into the low-level entry point to do the dirty work like the 2GB mess.)

I’m not sure if any of this makes any sense; it’s getting late and I’m tired.

Edit: To clarify, what I mean is that I think the 2GB slicing would be more straightforward to implement on something that’s much closer to struct iovec (or even exactly that), rather than CircularBuffer itself.

This is very reasonable solution within the context NIO uses it — the complexity of accessing elements in a custom slice does not matter the slightest when the element type is itself a collection like ByteBuffer.

But I think it does matter for Deque, because it is designed to also perform well with scalar elements.

johannesweiss · April 27, 2021, 6:40pm

There must be a misunderstanding here

I am a person and not NIO (of course, taking some more or less adapted examples from NIO).
(not that this matters but) I am personally a fan of generic programming. However, in Swift it often comes with a cost (investment in code, tests, resulting performance, ...). This leads me to write algorithms that are purely internal and operate on just one type with concrete types. Nothing more, nothing less.

Agreed.

I agree, for the Swift collections packages it's absolutely the right way. And for public NIO APIs, it is too which is also what we've been doing all along.

This thread is to decide whether opaque indices makes sense. And I offered my input that opaque indices unlock self-slicing and that may or may not be desirable. But if we take the base collections and do not alloc self-slicing that we decide that other self-slicing collections can't easily (& performantly) be built with them. All I was saying is that this is a decision we should decide very deliberately.

Is that an absolute truth or is that reflecting the current (sad) state of non-integer indices? With "sad" referring to myCollection[myCollection.index(myCollection.startIndex, offsetBy: 5)] instead of myCollection[.start + 5] or so.

I agree that right now this is the case. And right now, integer indices are better for the end-user for sure. However, opaque indices can make better building blocks.

What I am ideally looking for is a real improvement: A way to index that is great for both the end-user and works super well as building block (that for example would allow self-slicing collections).

Maybe we come to the conclusion that this is an impossible to achieve goal and we're happy with the status quo which is Int indices & Slice<Thing>.

Totally fair, but allowing for URBP slices still makes everything go slower for the 0.1% case. (which may totally be acceptable in my example because we're about to the kernel).

I see where you're coming from and it makes sense. The real world (and I didn't mention that) is obviously slightly more complicated and it's really a MarkedCircularBuffer<(ByteBuffer, Promise<Void>?)>. So for every write the user enqueues they can (if they choose to) also get a future that is completed when the individual write completed.

But I think we're getting slightly off-topic here, my goal was not to present an issue that I don't have another solution for. And I'm not saying what we do is the only or the best implementation. I was offering a use-case where I think a self-slicing collection could be useful. (And there are other examples)

It can absolutely be refactored to be as fast without having a self-slicing collection. The point I was trying to make is that it was handy, just something to consider before deciding that self-slicing is always stupid and should be avoided at all costs.

I'm not offended if we go down the Int+Slice<Thing> route, I'll probably be a little sad that we couldn't advance the status quo but that's 100% okay. I guess I haven't given up on the idea yet .

Totally reasonable, we actually have code like that too.

jrose · April 27, 2021, 7:52pm

I’d pretty strongly object to this: integer indexes have been repeatedly confusing for derived collections, in that slice[0] is very often an out-of-bounds access rather than the first element of the slice, and x.lazy.filter(…)[0] might access an element that is not in the filtered collection at all. In fact, I’d say integer indexes are only more useful when you’re accessing a collection at random offsets you got from somewhere else, which for many use cases isn’t necessary at all. Unfortunately, Deque is almost certainly one of those, so I’m not sure how much this changes the discussion in practice.

lorentey · April 28, 2021, 7:20pm

So to me slice[0] and x.lazy.filter(...)[0] are two very different issues.

Slicing

I agree SubSequence is probably over-optimizing for index sharing in the RandomAccessCollection case. However, I don't see how we could change how ArraySlice works without fundamentally rethinking the entire Collection hierarchy from scratch, and I am not interested in doing that. (This design wrinkle is also somewhat mitigated by the fact that Array and ArraySlice are two different types; Array is not self-slicing.)

Every Swift programmer needs to learn how Array works; it's a fundamental type. Internalizing how its slices work are part of this. I'm really not interested in fiddling with Array -- it may have been possible to tweak this in Swift 3 (though I doubt it), but I don't think it's desirable to do radical overhauls of Array’s design today. It is what it is.

So given that everyone knows Array, it seems obvious to me that random access collection types like Deque, whose API shape is essentially the same as Array's, should fully embrace the same interface. For better or worse, Array has set a very strong design precedent for random access collections; deviating from this pattern for no good reason would confuse/annoy the heck out of people.

(And no, I don't consider writev's arbitrary 2GB limit a good enough reason to consider changing Deque. An example for a good reason would be UnsafeBufferPointer's case, where pointer indices and self-slicing could have simplified some important use cases. (It also helps that pointers are strideable.) But as it happens, UnsafeBufferPointer also ended up following Array's pattern, strengthening it even more.)

Being able to take an Array algorithm and switch it to use UnsafeBufferPointer or Deque by simply replacing the type declaration is a crucially important feature -- it demonstrates that the stdlib is designed as a cohesive system, and that knowledge we learned about in one part of it translates directly to related parts.

I think it would be better if going from Array to RandomAccessCollection would be just as simple; as it currently stands, it requires completely rewriting the existing code to adopt a whole different set of APIs that are far less readable. (startIndex/endIndex vs. 0/count is definitely part of this; but the real issue is having to replace perfectly cromulent things like foo[n] with the serpentine foo[foo.index(foo.startIndex, offsetBy: n)].

SE-0265 attempted to work around this on the library level with foo[.first + 3]. This feature may come back at some point (although probably not any time soon, given that AFAICS, satisfying the Core Team's request to allow anchoring offsets to arbitrary indices isn't possible to implement without major type system and/or language surgery).

However, no matter what syntax it ends up using, it needs to work with every collection type, and it needs to be obvious whether a particular expression works with raw indices or offset expressions. Which means that expressions like array[42] or deque[i + 42] must continue to refer to raw indices; otherwise SE-0265 would be massively source breaking. Which in turn means that while SE-0265 would definitely make it easier to write generic code, it would still look different to most code that operated on a direct Array -- which I fully expect to continue to use convenient direct indexing.

Please also remember that up to a few months ago, I was firmly in the Every Collection Must Have an Opaque Index Type camp, and in fact I remember I used to enthusiastically encourage the NIO team to adopt it as a design principle. Deque started out with an opaque index -- I tried extremely hard to make it work, even after it became blindingly obvious it was the wrong design. So I changed my mind, and I'm unlikely to change it back -- I've learned my lesson.

`LazyFilterSequence`

On the other hand, LazyFilterSequence has indefensibly bad design (on multiple points), and we need to fix it. It's not a fundamental type like Array is; the minor source compatibility implications of tearing it down and replacing it with something sensible aren't worth the pain of continuing to live with its terrible pattern. (Because it is in the stdlib, people are assuming that it's a good pattern to follow, so it is actively damaging the overall health of the project. I don't like having traps like that in the stdlib.)

jrose · April 28, 2021, 7:33pm

My point wasn't necessarily "don't use an integer index", just that there are some ways in which opaque indexes are "harder to misuse" than integer indexes. I don't think I'd have bothered to cut in at this point except that "hard to misuse" is one of our goals for API design, and I don't think integer indexes automatically win that one.

(I agree that opaque indexes are harder to use, period, which comes up when you are doing tasks that operate on offsets or single elements. There are fewer such tasks than people think there are, but that doesn't mean they don't exist. I also agree that they are harder to explain and understand.)

lorentey · April 29, 2021, 4:33am

Indices are pointers. The only way to prevent index misuse is to never use raw indices.

I don't think a RandomAccessCollection can do anything to prevent accidentally using an index that refers to the wrong element. It doesn't matter how hard a RAC tries to implement custom index validation -- the protocol requirements embed the possibility to use integer offsets; indeed, they encourage it.

For random-access collections, any index validation scheme beyond a simple range check achieves nothing -- except (1) it introduces a potential performance issue and (2) it makes it harder to use the collection correctly. (For those who use the original index type instead of converting to/from simple integer offsets.)

(Important note: none of the discussion in this topic applies to Collection types that aren't random-access. Opaque indices and extremely careful index validation is absolutely crucial for those. When the Collections package gets its first non-random-access collection type, I guarantee it'll have an opaque index with custom invalidation rules.)