RFC: Deque: should it have an opaque index?

johannesweiss · April 6, 2021, 3:30pm

I would argue that opaque indices are generally a very good thing because they can prevent accidental misuse of data structures. Arguably, Deque could do very well with an opaque index.

SwiftNIO contains the CircularBuffer which has (pretty much) the same semantics as Deque. One difference is that CircularBuffer uses opaque indices and we haven't had any issues with it. In fact, we had some spare bits in the opaque index which we could use for a rudimentary index check. Essentially, if the CircularBuffer contains less than 2^24-1 elements, we would store the current count inside of the spare bits of the index. That's super cool because you can detect (in debug mode) very common misuses such as:

var d = CircularBuffer()
d.append(1)
d.append(2)

let startIndex = d.startIndex
d.popFirst() // <- index mutation!

print(d[startIndex]) // used an index that has been invalidated by mutation

This occurred to me when I was reimplementing CircularBuffer on top of Deque.

lorentey · April 6, 2021, 8:04pm

Deque's initial implementation had opaque indices, and I tried very hard to make it work.

Unfortunately, it felt pointlessly pedantic, and it actually interfered with the understandability of the API -- because it obscured how deque indices actually work. I've eventually come to the conclusion that unless there is some strong reason not to do so, random access collections should always just use array-style integer offsets as their index. It's to the point that I consider it a bug that RandomAccessCollection doesn't have an Index == Int requirement.

(Conversely, I feel equally strongly that non-random-access collections that use integer indices should be abolished. Especially if the indices aren't even consecutive integers -- looking at you, LazyFilterCollection. ಠ_ಠ)

RandomAccessCollection effectively requires Index to be equivalent to integers, anyway.

That's super cool because you can detect (in debug mode) very common misuses such as:
var d = CircularBuffer()
d.append(1)
d.append(2)

let startIndex = d.startIndex
d.popFirst() // <- index mutation!

print(d[startIndex]) // used an index that has been invalidated by mutation

In my view, there is no misuse here. Using integer indices makes it abundantly clear that the only index that is actually invalidated here is the original endIndex. The original startIndex may now be pointing to a different element, but it is still a perfectly valid index. Wrapping the integer offsets into a custom type doesn't change this, it merely obscures how this works. (The general-purpose Deque shouldn't artificially force-invalidate its indices -- the way array-style indices work is a feature, not a bug. This isn't the same situation as Set and Dictionary, where bucket indices may or may not accidentally end up pointing to a valid element even after invalidation: in Array/Deque/OrderedSet/OrderedDictionary, indices are guaranteed to follow a very straightforward logic, and people using such types generally write their code to rely on this.)

I think that artificially forcing people to regenerate their indices after every mutation is, in general(!), the wrong approach. Of course, in specific use cases, one may want to encourage more careful indexing, and in those specific cases it is certainly still possible to wrap Deque in a custom collection type that adds its own index validation logic. (Why even provide indices though?)

FWIW, if I was given a random-access collection that insisted on me spelling foo[n] as foo[foo.index(foo.startIndex, offsetBy: n)], I'd either go look for a different implementation, or I'd just set up something to work around the unhelpful design, like this extension:

extension RandomAccessCollection {
  /// Complexity: O(1)
  internal subscript(_offset offset: Int) -> Element {
    self[self.index(self.startIndex, offsetBy: offset)]
  }
}

(I'm pretty sure I'd have argued for the same Deque design even if we were able to conveniently access elements at integer offsets from the start of any collection. SE-0265 was returned for revision 17 months ago -- we have some ideas on how to incorporate the review recommendations, but nothing that is obviously the right approach, and frankly adding complicated language sugar to simplify indexing isn't a top priority for anyone right now.)

lorentey · April 6, 2021, 8:23pm

(Disclaimer: this emphatically does not mean that I'm in favor for self-sliced top-level collection types with integer indices. I'm merely arguing that random-access collections should follow in Array's footsteps -- I'm not trying to convince anyone to emulate Data.)

lukasa · April 6, 2021, 8:35pm

I think that you and I aren’t reading the example the same way. By my read the invalidated index in this context is startIndex, which is definitely not safe to use to subscript the collection any longer.

lorentey · April 6, 2021, 9:02pm

I don't think so -- in Array's case, startIndex is the constant 0, and it's perfectly fine to save it out and keep reusing it, even after removing the element that was in the location it originally addressed. (As long as there are enough elements remaining, of course.) Deque works the same way, and I think all RandomAccessCollections should do the same.

I expect the code sample to print 2. It would be surprising if anything else happened!

Indices represent positions in the collection; they aren't just addressing a particular element. (E.g., consider that MutableCollection's subscript setter is required not to invalidate any index, even though it changes the element that corresponds to one of them. One major problem with RangeReplaceableCollection is that it allows its mutations to invalidate indices, without providing any way to keep your position in the collection -- RRC mutations ought to all return fresh new indices around the subrange that they modified.)

Now in theory we could say that Deque's indices should address specific slots in the storage buffer, so popping the first element would increment the startIndex (wrapping to zero if necessary). However:

This would violate the substitutability requirement(*) of Equatable. Two Deques that compare the same could have different indices, so they wouldn't be substitutable with each other.

Not that Equatable's substitutability requirement is anything other than a vague preference in practice. It certainly doesn't have a precise formal definition, and we aren't taking it very seriously, even in the stdlib -- e.g., Set and Dictionary are flagrantly violating it when it comes to indices (or element ordering), and the world has not ended yet.
It would make it overly complicated to implement the index type's Comparable requirement.
Most importantly, it would make Deque work surprisingly different to its traditional implementations elsewhere, as far as I can tell without providing any meaningful practical benefit.

Karl · April 6, 2021, 9:56pm

I disagree with that in general; it may hold true most of the time for RandomAccessCollections, but it doesn't work if your index represents a variable-length range from an underlying collection. In the code snippet, startIndex would have the correct starting position, but not the correct length.

Agreed.

lorentey · April 7, 2021, 1:03am

"Indices are positions" isn't some novel concept I just randomly invented to support an argument -- it is literally the definition of Collection.Index:

A type that represents a position in the collection.

Valid indices consist of the position of every element and a "past the end" position that's not valid for use as a subscript argument.

In the case of a collection of slices (like String), the Index consists of a storage range, so it represents a position in the collection -- it's just that this position may not make logical sense in a particular collection instance. This doesn't change the fact that the index is a position! (Note though that in the example above, I'm only concerned with RandomAccessCollection types, which are always able to use integer indices -- indeed, RAC explicitly requires O(1) conversions between integer offsets and whatever index type the collection may be using.)

Similarly, in a tree-based collections, an index is usually a path in the tree -- e.g. represented by a sequence of child indices within each node along the way. Paths represent logical locations within a tree -- so indices are positions even there. A path may represent a position that doesn't exist in a tree, but this doesn't change the fact that it represents a position -- it may just be a dangling reference (a.k.a. invalidated index).

In Array's case, the mapping from indices to elements feels very natural -- indices are simply offsets from the beginning of the collection (which is probably the most convenient way we can identify a storage location in a flat buffer), and the way mutations interact with how the contents of the array map to these offsets seems obvious to me. If I remove the second element, the third and subsequent items will each slide to a position that's one slot closer to the start, so what was originally the third element will then occupy the second position. The position (a.k.a. index) itself did not change, and I feel absolutely no shame about continuing to use it.

All I'm arguing is that I think it would be very much desirable for any collection that can possibly work like an array to actually embrace the same indexing semantics.

lorentey · April 7, 2021, 1:18am

AIUI, Johannes is arguing that there is another, equally valid, way to represent locations in a flat storage buffer -- essentially by having indices store direct pointers to the underlying memory. And he is right! (I forgot that CircularBuffer worked that way. I think the three points I raised above are still valid, it's just that there is nothing "theoretical" about this approach.)

In Deque's case, having indices store an offset from the beginning of the storage buffer (rather than from the logical beginning of the collection) would at first glance eliminate a (fairly predictable) branch in the subscript implementation -- however, the same branch would just come back in the precondition that protects against out-of-bounds indices, so performance-wise, this would be a wash. (Or maybe even a slight overall pessimization, if we consider that the extra branches would also seep into things like index(after:) (which is literally just index + 1 in the current Deque).

Pointers-as-indices enable self-slicing, which seems largely undesirable to me in general -- because it leads to people unknowingly holding on to tiny slices of a huge collection, preventing it from getting released. Forcing people to make a copy is a good thing sometimes.

However, there is one particular, very high-profile case where self-slicing and non-integer indices would've made perfect sense for a RandomAccessCollection: it's UnsafeBufferPointer. Using UnsafePointers as indices would have been a terrific choice for it! It would have allowed it to be self-slicing, eliminating the clumsy rebasing: initializers. The memory leak above isn't relevant to UBP, because it isn't owning its memory, so there is no danger whatsoever of people accidentally retaining large buffers.

I think my adjusted position is that range-replaceable random-access collections ought to always use integer offsets as indices.

Karl · April 7, 2021, 2:02am

I think you misunderstand - I never argued that indices are not positions; only that they are not only positions. They may indeed address a particular element, or be bound to the state of the collection in some way that isn't exposed through the collection protocol.

Pointers as indices are not compatible with copy-on-write data types in general, unless you really do want to invalidate all indices on every mutation. Swift also makes it very difficult to escape a pointer value of any storage that's under ARC (it's all just scoped access to inner pointers via withUnsafeBufferPointer-style methods).

lukasa · April 7, 2021, 6:48am

And presumably as a corollary that range-replaceable RACs should not be self-slicing.

I think it would be very beneficial, as part of the Collections effort, to have you and the rest of the stdlib team write down what it is you believe the “rules of Collections” are, because it is abundantly clear that the community does not have a consistent understanding.

To my mind, the idea that SomeArbitraryCollection.popFirst() should only invalidate endIndex seems to be at odds with reality. It can do a few things, depending on what SomeArbitraryCollection actually is:

If SomeArbitraryCollection is self-slicing, popFirst will return an object of type Self and the old startIndex will no longer be valid. This is per the rules of slices, which say that indices into a slice of T must point to the same elements as they would in the parent.
If SomeArbitraryCollection is not self-slicing, popFirst will be a linear-time operation that you probably didn’t want to call, but in this case what happens to the indices is undefined (you literally construct a brand new collection whose indices may be totally different).
Unless you’re Deque, which overrides popFirst even though it isn’t self-slicing to have a faster code path.

This is not exactly a clear situation! It would be really, really good if the Swift team could commit to writing down how things should work, and Collections seems to be the right place to do this.

lorentey · April 7, 2021, 10:10am

I don’t think self-slicing should be an option for any collection other than:

SubSequence types, where the potential risk of a memory leak is low — it’s quite obvious that values of these may be tiny slices in a huge collection.
Cases where the collection value does not own its storage (like UnsafeBufferPointer) — in these cases, the memory leak simply isn’t possible.

I expect I’ll start to draw up some guidelines at some point in the package documentation — but I’m a very unreliable oracle. (These discussions are extremely valuable for hashing out / refining some of the details that will probably end up there.)

The problem of course is that there are no hard and fast rules; beyond the “formal” protocol requirements (which I’m sad to admit are sometimes ignored when inconvenient — hopefully we’ll be able to fix some of these), there are only opinions, sometimes backed by hard-earned experience, sometimes less so. (For example, I was very strongly convinced that opaque indices are always the right way to go (and was quite vocal about it too), until I tried to sit down and actually use the resulting implementations in the first cut of what became the collections package. And even then I kept trying to make it work for weeks and weeks, because I was so invested in the idea.)

Given all this, sitting down and thinking deep thoughts about what might be the most sensible design for an Ideal Collection Implementation seems far less productive an idea to me than… well, sitting down and actually implementing more collection types. Incidentally, just the three collections we currently have in swift-collections already highlighted several areas where the existing collection hierarchy might be in need of some work:

Deque (and my old B-tree work) underscore the need for a better way to handle piecewise contiguous collections, especially when it comes to copying data between them. (I have a reasonable-looking idea to extend IteratorProtocol to replace/fix the (unpleasantly limited, underscored) Sequence._copyContents requirement we have now.
We probably need a PermutableCollection protocol so that ordered collections can take advantage of generic algorithms that belong in the shuffle/sort/partition family.
We may want to introduce a RangeRemovableProtocol to capture the parts of RRC that can be implemented by ordered collections. (This may be of limited usefulness in general, though.)
We will need some sort of dictionary protocol. Or not! We’ll see clearer when we have more than two dictionary types.
It would be nice to have a UniqueCollection protocol, providing the semantic requirement that elements are unique. (Set, Dictionary, Dictionary.Keys, OrderedSet, OrderedSet.UnorderedView, OrderedDictionary, OrderedDictionary.Elements (plus their subsequence types) might be enough examples to justify the protocol.)
SetAlgebra does not seem very good for anything but Set and OptionSet. Maybe it will work out for SortedSet. However, we may want to consider adding a lightweight “lookupable” protocol that simply adds stricter upper bounds on the complexity of firstIndex(of:)/lastIndex(of:)/contains(_:).

Imagine how many more ideas we’d have if we added three more collections. (E.g., what if we made a serious attempt at implementing a SortedBag? How would it fit in the protocol hierarchy?)

Most of these ideas can and should be prototyped in package form. (Although some are more difficult than others to fully develop this way — IteratorProtocol additions in particular might be too annoying to do that way.)

As for popFirst:

EDIT: aargh, I wrote a whole section thinking about removeFirst, and calling it popFirst. It’s been a long day. I removed it and replaced it with something more relevant:

First of all, popFirst should be a RangeReplaceableCollection requirement (or at least an extension method expressed in terms of removeFirst). I think it is a fairly obviously a bug that it isn’t, whether or not it was an intentional choice.

Second, because popFirst was omitted from RRC, there are no requirements about its behavior. Set and Dictionary certainly make no attempt to emulate the generic extension’s indexing invalidation in their own implementations. Someone could implement a collection type where popFirst returned the tenth element, changing nothing — and they wouldn’t violate or break anything but common sense.

That said, in generic contexts over Collection where Self == SubSequence, we can rely on the fact that popFirst resolves to the stdlib’s extension method that is (currently) expressed in terms of slicing. However, this isn’t true in cases where we’re calling popFirst on a concrete type.

Additionally, the default popFirst does not document what indices it invalidates, so technically callers should assume that it may invalidate all of them — like any other under-documented mutation operation. (Note: this is just a pedantic observation; we obviously wouldn’t risk breaking code by changing the stdlib’s current implementation in an incompatible way, unless we find a very good reason to do so.)

With all that said, I think I agree that it would probably be rude for a self-slicing type to shadow the standard popFirst with different semantics for no good reason. But no requirements apply here, so even the flimsiest of excuses can be considered good enough reason to do so. (And, of course, this doesn’t apply to Deque’s popFirst, because Deque isn’t self-slicing, so its implementation isn’t shadowing anything. (Self-slicing generally isn’t a good idea anyway. )

Remarks:

Beware, SubSequence == Self does not imply that popFirst has O(1) complexity — this does not follow from slicing/indexing requirements. It’s sometimes a reasonable assumption, but it’s just that. For example, the best promise we can make about Substring.popFirst() is that it takes O(n) time, where n is the size of the substring’s UTF-8 encoding.

The rules of index invalidation vary wildly between data structures. There are no universal rules, each data structure is a unique, beautiful butterfly that should be allowed to fly wherever its wings take it. I recommend implementing Array-like indexing for most cases of RandomAccessCollection, because I’d very much prefer if we’d at least kept the trivial cases consistent — and in RAC’s case there is usually no point in doing anything more complicated.

johannesweiss · April 7, 2021, 1:35pm

First of all, thank you so much for taking all this time laying out your thoughts on these topics. This is super useful!

Right and I think that one can forget how a certain collection works is one of the real issues in Swift today. Every collection comes up with its own rules what's okay and what is not (re indices). You were thinking in Deque and I was giving an example in CircularBuffer. Both types are kinda the same and yet, what's allowed and what isn't is totally different. And still both are valid collections. Now throw Data/LazyFilterCollection into the ring and it gets even more fun .

Because things are so complicated (in many cases you'll have to read the code / all the docs to really know what's okay and what isn't) I'm in the camp of "opaque indices for everything, no index re-use after mutation unless explicitly allowed" camp. This forces the programmers to use .startIndex (instead of 0) and that'll work on all collections. Of course, we need better syntax (but we need that anyway because of String).

That is a good point. I guess we have indices for two reasons:

the Collection conformance which allows you to use a bunch of API
for gathering writes we often want to look at the N elements and then remove M (< N) elements after the gathering write told us how much it consumed. That's just easier with indices (but can of course be done without too by removing+re-adding remaining).

I totally get this point but it still feels to me like we may be standardising Int&0-based indexing because we haven't found a way to spell [thing.index(thing.startIndex, offsetBy: distance)] better. We'll need this anyway because String is arguably quite important. We have rejected a number of useful things because a certain language features is/may be coming and in this case I'm just not sure if Int&0 is really the best given that in general Collections don't start at 0 (because they may be slices of Int-indexed ones).

Yes, what you describe can definitely be an issue. But not being able to pass on a slice (of say an Array, String, ...) to a function that has been written to accept Array/String/... is a big developer UX issue in my mind (all this String(string.dropFirst(n))).

The only way out really is to code up everything on protocols (ie Collection/StringProtocol) but then getting acceptable performance gets hard very fast and often you'll end up implementing everything on top of Unsafe*Pointer because that was the concrete type you could reach using withContiguousStorageIfAvailable from the collection protocol.
Also, once you implement your function on Collection, you can no longer assume Int or 0-based indices. Worse, you may write a function on Collection where Index == Int and assume 0-based. Then you write a few test cases which Array/Deque which work and then it crashes at runtime once you pass in a slice that doesn't start from 0. That feels almost like the Data situation.

I guess you can also add a new overload/method for Slice<Array/Deque<...>> or Substring but that's no fun either.

If (that's a big if of course) you don't hold onto the self-slices collection for long, then self-slicing collections can be very handy. You can stay often stay in the land of concrete types and still get good performance (okay with generics you'll need to still learn about inlining, inlining thresholds, specialisation, ...).

Just for the record, I definitely don't want to argue for everything being self-slicing. But a self-slicing data structure can be a great building block because it allows being wrapped by something that is either self-slicing or not self-slicing. Something that isn't self-slicing can't be easily wrapped by something that wants to be self-slicing. For example: CircularBuffer allows you to get either behaviour:

allocate & copy to get a CircularBuffer: CircularBuffer(circularBuffer[subRange])
no allocate/copy (may hold on too much memory): circularBuffer[subRange]

whereas with Deque we have

allocate & copy to get a Deque: Deque(deque[subRange])
no allocate/copy to get a Deque: impossible. deque[subRange] gets us a Slice<Deque<...>>

Yup .

The problem with this argument is that a user doesn't think of a collecting being a RRC, they think of it as a concrete Array or a Deque. They'll likely have no idea if Array is an RRC or not, many Swift programmers have in fact never heard of RRC. In my experience the way you learn about RRC is when you code up something that you think can work on all collections, then you dive into the various collection protocols and find RRC and implement your algorithm with it.

Said all that, I haven't made up my mind yet what really is best but the current situation in Swift isn't great I think. To allow more than one concrete data type (which most of the time doesn't include its own slices!) you have to use pretty generic protocols and once you're in the land of Swift's collection protocols everything gets a lot harder: performance, indexing, slicing, slow default implementations for operations that you didn't know you inherited, ...

lukasa · April 7, 2021, 1:35pm

Yes, I don't disagree. I'm not looking for a document that outlines the platonic ideals. I'm looking for a document that outlines both hard-and-fast rules (e.g. when must indices not be invalidated, when may they be) as well as what the IETF calls Best Current Practice. Basically, what should I think about?

We have lots of material for such a document. For example, a Collection SHOULD NOT be self-slicing and integer indexed, because it confuses the heck out of people who use it. Or consider index type selection: you've outlined lots of guidance you might want to give about these choices.

I guess what I'm saying is: I'm not asking for you to sit down and do a lot of deep thinking. I'm just asking you to write down your thought process when you come to design a new collection. You think more about collection design than almost anyone else in the Swift community, and it'd be really nice if folks could benefit from your experience without having to bother you directly every time.

Karl · April 7, 2021, 7:13pm

This is an excellent description of some of the painful points when using collections in Swift. I wonder if, for types which actually can support self-slicing, they could offer some alternative APIs (perhaps a separate initializer or closure-scoped instance), so that even slices have efficient access to the entire API offered by the concrete type and functions which haven't been written with slices in mind. For instance, it is actually possible to create a String using the same storage as a Substring, so some of the prototypes for shared string APIs include offering the ability to efficiently "promote" a Substring to a String.

If Deque is built such that it supports only presenting a portion of its backing storage (instead of assuming it should present the entire thing), it could offer a Deque.init(sharingStorageWith: Deque.SubSequence). Alternatively, if we had generic extensions, it should be possible to write:

extension<Element> Slice where Base == Deque<Element> {
  var dequeWithSharedStorage: Deque<Element> {
    Deque(sharingStorageWith: self)
  }
}

myDeque.dropFirst().dropLast()./* ... */.dequeWithSharedStorage

It would still be a bit of a syntax burden, but it would be reduced, I think. You can do something like this today with clever use of protocols (as people do to extend Array<Optional<T>>).

As others have noted, self-slicing can be surprisingly expensive as it keeps larger allocations around. There is plenty of experience across the industry which indicates that it is best not to self-slice. But it can make collections much easier to use, so while I don't think it should be the default, collections should try to provide some interface to do it, if their implementations allow.

johannesweiss · April 8, 2021, 6:15pm

This actually sounds like a great idea! I think I like the idea of a data structure that isn't self-slicing (to prevent accidental misuse) but offers a (constant time) way to convert a slice to the regular collection type could really alleviate the UX problems that we have right now without introducing different performance problems (alloc & copy everywhere).

For Deque that'd be let middle: Slice<Deque<...>> = deque.dropFirst().dropLast() but there'd be a Deque(nameToBeBikeShedSomethingSomethingSlice: middle) which is ~free. This would also imply opaque indices of course because otherwise we'd run into the "sometimes 0 indexed" problem.

lorentey · April 14, 2021, 4:44am

Yep, there is immense complexity here, and I do understand the desire to simplify things. But it gets tricky, because there is no One True Way to design a data structure. The Index type and its behavior is extremely closely tied to the exact data structure that the collection implements -- e.g. the differences between how various collections invalidate their indices merely reflect the deep differences between the underlying data structures.

(To add insult to injury, it is generally impossible to implement 100% reliable index validation -- it's the same problem as trying to keep track of the provenance of unsafe pointers, validating that the original buffer is still around and that the access is within bounds on each dereferencing operation. To put it in another way, indices are just pointers in a different disguise. Luckily in this weaker form, pointer mistakes are far less catastrophic.)

For engineers that write code that use collection types, the right thing to do is to always prefer to use (generic or concrete) higher-level algorithms, rather than writing raw loops that operate on raw indices.

For engineers implementing new Collection types, the right thing to do is to emulate the behavior of a standard collection type whenever this is possible. Most Swift programmers know how an Array works, and are able to use it correctly -- so it seems like a good idea for a collection type that could work like an array to actually do work like one, unless there is a really good reason not to do so. I think this is the crux of my argument against RandomAccessCollections with custom indices. (One example for a really good reason would be the UnsafeBufferPointer case.)

So my problem with CircularBuffer's design is that it forces people to learn about its unusual indexing semantics, for some benefit I'm not seeing yet -- what is the really good reason for CircularBuffer to work the way it does?

Ah, I think there is an assumption here that needs to be uprooted and incinerated, to prevent it from spreading even further.

If we're operating on the level of raw indices, we always need to read the documentation. (And only the documentation: the code's behavior may change without warning; promises about index invalidation are forever.)

If we're lucky, the documentation will tell us that indices are simply integer offsets from the start of the collection (or, in the case of subsequence types, the startIndex). In this case, we can stop reading relatively quickly, and simply apply our preexisting knowledge about how arrays work.

In all other cases, we have to be exceedingly careful about not assuming anything about index validation beyond what is promised in the documentation. This means that we need to repeatedly consult the documentation every single time we call a mutating method, interpreting the text in the narrowest possible way, and continuously check and double check that we aren't accidentally holding on to a (potentially) invalidated index.

This is even more important when we're writing code that's generic over one of the mutable collection protocols. For example, if a mutation method does not promise anything about what indices it invalidates, then the only correct assumption is that it will invalidate all of them.

Mistakes can lead to corrupt results rather than a clear trap, because completely reliable index validation is generally impossible/impractical. (And just because an index happens to be valid doesn't mean that it addresses the element we want.)

Again, the best approach is to never deal with raw indices and naked loops.

I don't think this is a particularly helpful position -- for a RandomAccessCollection, it results in an annoyingly pedantic interface, for no good reason whatsoever. I believe it is safe to assume that Swift programmers know how arrays work; there is no need to protect them against the dangers of integer offsets.

The problem with the idea of introducing new syntax to express offset-based indexing in collection types is that I've come to believe that it would simply be a high-effort way to encourage people to write even more code that operates on raw indices, rather than using collection algorithms. (With the added bonus of a potential performance trap.)

I think the effort/resources that would be required to implement new syntax for this would be far better spent on adding even more algorithms, especially ones dealing with pattern matching & replacement. So reviving the offset-based addressing proposal is very low on my personal todo list.

That said, this opinion is somewhat shaken whenever I'm implementing a generic algorithm over RandomAccessCollection and I feel frustrated about the completely pointless verbosity of its indexing operations.

This is why protocols exist -- the best APIs do not take concrete collection types. I expect APIs that insist on e.g. taking an Array will stick out more once other collection types become universally available (and in widespread use). The Collections package will help get to that point, I expect. (String is admittedly somewhat of a special case though.) Well-designed APIs should be convenient to use.

I see where you're coming from, and I agree, but I suspect this may be slightly overstating things. Yep, generic methods often work better when marked @inlinable, but unless one is working on a component where performance is absolutely crucial (like some parts of the stdlib, the System package, or NIO), in most cases the generic implementation can simply convert the argument into an Array and forward the call to an opaque implementation. In many cases, it's also fine to simply expose a non-inlinable generic.

I guess I just don't view slices as anything other than short-term, mostly read-only views. (Mostly read-only, because we cannot currently implement in-place mutations through the range subscript, so slice mutations are riddled with CoW copies. Short-term, because holding onto a slice for too long would be wasting memory.)

To me, the slice being a different type is a feature -- it allows the actual collection to not deal with maintaining (and checking) a separate bounds (which would add a nontrivial cost to every collection).

My proposed guideline is for implementors for collection types, not regular Swift programmers who merely use them. People implementing collections need to be familiar with the collection protocols.

The intended benefit of this guideline for "normal" Swift programmers is that a large percentage of collection types become easier to understand, because they behave in a consistent way, without arbitrary differences, and without hidden gotchas.

As a regular Swift programmer, I find it much easier to follow Array's index invalidation rules than it is to follow RangeReplaceableCollection's. Saying that indices are invalidated on every mutation is emphatically not helpful API design -- it converts what would be straightforward array-like code to become an exercise in tip-toeing across a maze of deadly traps. (This is very well justified in the case of RRC, although we could've done a better job in designing its APIs -- having to lose one's place in the collection every time after inserting an element is not great.)

CircularBuffer's custom indexing rules are halfway between Array and RangeReplaceableCollection, which is even worse. (It is documented to only invalidate indices on size changes, but then it supposedly invalidates all of them. IIUC, the implementation is slightly different, though: if the buffer returns to its original size, then the invalidated indices become valid again -- so as usual with custom indices, the rules aren't fully enforced at runtime.) CircularBuffer entices me to make use of the fact that I can change contents without invalidating indices -- which is a very array-like thing to do -- but then it pulls the carpet from under me by forcing me to regenerate indices after every push and pop. Is all this complexity really worth it?

johannesweiss · April 22, 2021, 3:15pm

I disagree. For many public APIs probably yes (but even that I don't think is an absolute) but internally I have had much better results coding against concrete types. Swift's performance story is incredibly shaky and I can't justify the time to always code against a protocol internally. The performance will likely be bad if I only use what RAC or so guarantees, even harder with RRC.

And I don't really buy that at some point we'll have so many ready-made algorithms on *Collection that we never have to code our own. And it does come up that I would like the same algorithm against ConcreteCollection and Slice<ConcreteCollection>. Without self-slicing collections I'm left with these options:

code against the right *Collection protocol: Unless I assume that it's only ever going to be called with ConcreteCollection or Slice<ConcreteCollection> which is bad, I'm in a world of pain (see your own points above). I can possibly add some assertions about the type?
copy & paste the code: also bad

With (the option of) self-slicing, I can just code against ConcreteCollection and make use of the index guarantees that ConcreteCollection provides me in the docs. Happy times .

lorentey · April 22, 2021, 10:53pm

Where is this animosity coming from? In general, for passing around data, the right move is to be generic over Sequence. Is it really that difficult to type angle brackets? (It's arguably boilerplatey, but it can't be much more difficult than figuring out efficient but safe index validation schemes for self-slicing types, and fighting all the inevitable indexing bugs.)

As for RAC & RRC: RAC is essentially isomorphic to a read-only array; it promises excellent performance for all indexing operations, and as far as can tell, this promise does work out in practice. (I don't know what to think about RRC. Outside of actual RRC extensions, I honestly never really felt a desire to constrain a method argument to RRC -- its methods are mutating, so this would only make sense on inout arguments, which are a bit whiffy.)

Why would performance be bad? Specialization is often important, and across modules, it can indeed take some work, but for an internal generic, it comes for "free".

Unspecialized generics are not that slow, either, especially if all they do is copy things out of a sequence. (Sequence has hooks to make that fast, even in a generic context. These optimize for contiguous collections, but we can and should fix that. I believe the fix is mostly a library level change, so it's relatively easy as things go.)

OK. But I still don't really get why you need to pass around circular buffer slices so often. There is nothing wrong with it per se, but if the goal is raw performance, wouldn't it be more productive to copy the data out into a contiguous buffer before processing it? (Especially with the current implementation, where each element is wrapped in an Optional. Memcpying into a flat buffer can't be slower than laboriously extracting individual elements...)

Are you ever mutating these CircularBuffer self-slices? That'd probably be a big no-no in any high-performance code. But with self-slicing, how can you prevent it from happening?

johannesweiss · April 26, 2021, 2:41pm

Usually when I need to write some algorithm, I know what collection it will be used on when the program runs. It's often very targeted. There are many problems that I'd buy into if I made an internal algorithm more generic than necessary. A selection of problems:

because of issues like this, that, this, that, or this (which is just a selection of issues I file, many more on the team with similar ones) you have to look at the generated assembly, write targeted micro benchmarks and allocation counter tests if you want performance. Generic algorithms mean more of all that.
assessing the assembly/SIL gets harder with generics because you'll end up with multiple implementation so finding the one that actually runs is harder (yes, this is literally about finding the actual implementation)
you can't necessarily stay on the "known fast" path for a collection because you want to make it compatible with all collections. Take for example this issue. I always assumed that buffer.first! ~= buffer[buffer.startIndex] but it turned out that for weird optimiser reasons, the former was much slower than the latter for specific element types...
extra boilerplate
testing is much harder. If I code against a protocol, I'd like to verify that I actually only used guarantees given by the protocol but there aren't any good tools for that. I'd need to write some deliberately weird collections to test that my generic impl actually works. This is all possible BUT costs time and increases the likelihood of bugs.

"Not that slow" is a bit like ' "fast" '. Both of them are not necessarily fast enough. Are you thinking about _copyToContiguousArray and withContiguousStorageAreAvailable? Yeah, they work, however a lot of things need to fall into place. In ByteBuffer's slow path copy copying a Sequence where Element==UInt8 we for example have this lovely function

    @inline(never)
    @inlinable
    @_specialize(where Bytes == CircularBuffer<UInt8>)
    mutating func _setSlowPath<Bytes: Sequence>(bytes: Bytes, at index: _Index) -> _Capacity where Bytes.Element == UInt8 {

just to not make performance too terrible. This is called from a public API which is why we accept Sequence where Element==UInt8. And I'm totally cool with us doing the work to optimise our code if we need the generics. All I'm saying is that making your code generic codes with a high cost that is IMHO not worth it if you're writing an algorithm that runs on precisely one data structure in practise and is internal anyway.

I don't think I said we need them super often but I do think I said it makes CircularBuffer a great building block. You can use it to build collection and sometimes being able to be self-slicing may be a big plus.

For CircularBuffer I believe we basically have one use case. Often we store buffered things in CircularBuffer. And frequently when there's further processing you want to hand part of a CircularBuffer to another function which makes some extra calculations.
For example, writev has a weird limitation that you must not ask it to write more than 2 GBs in one call. So if NIO has a CircularBuffer<ByteBuffer> of writes to be written, it can be useful to get a temporary slice of CircularBuffer<ByteBuffer> which is just the first N chunks which sum up to less than 2 GB. This sliced CircularBuffer will live just very shortly and in 99% of the cases the slice is the whole CB.

Of course, I could express this as Slice<CircularBuffer<ByteBuffer>> but then I'd pay for the double index-checking (in Slice and in CircularBuffer) all the time. But having more than 2 GB of ourstanding writes enqueued is (hopefully) a very rare situation so I'd like to not pay for it if I can. On top doing all the index checking twice, this emits a non-trivial amount of extra code which gets you closer to the inlining threshold so it may make something else a lot slower just because you switched from CircularBuffer<ByteBuffer> to Slice<CircularBuffer<ByteBuffer>>.

And yes, if this is the only example in the world, we could also pass a CircularBuffer<ByteBuffer> and a an "write only up to X" integer around. Or make use of the value semantics and remove the elements that are too many. But nothing feels as easy as

func doSomeWritev(_ bytes: CircularBuffer<ByteBuffer>) { ... }

// this is simplified and not the real code, just illustration
let writtenUpTo = circularBufferOfBytes.giveMeEverythingUpTo2GB { buffer in 
   doSomeWritev(buffer)
}
circularBufferOfBytes.removeFirst(writtenUpTo)

If we don't think we ever need self-slicing collections, then I agree we don't need to make the stdlib's building block collection self-slicing either . I'm just offering some ideas where self-slicing may be useful.

It depends, in NIO when sending data, the answer is no. The kernel is happy to accept a list of scattered blocks to write, so why would we allocate & copy it into one contiguous chunk. On the receiving end, the story is often quite different because the user eventually needs it contiguous. And again, the key here is being composable. Not every program benefits by using scattered writes but many will.

I don't think we do.

Yes, the current implementation is meant to go away (didn't see the point on going *Unsafe* when Deque is just around the corner). I was hoping to base it on Deque and have less tricky code to maintain. Built using CircularBuffer we also have MarkedCircularBuffer so we often compose collections into more powerful ones.

Karl · April 26, 2021, 3:52pm

One more thing: writing the contents of a slice to contiguous storage (in the form of UnsafeMutableBufferPointer) is significantly slower than writing a non-slice. (Bug report).

The reason is that UMBP.initialize calls the secret _copyContents(initializing:) hook, which returns an iterator. In order for Slice to forward this to its base collection, it would need to transform the iterator returned by the base's implementation to Slice's own iterator - which is not possible in the general case. It also means that it can't use withContiguousStorageIfAvailable -- which is incredibly depressing.

I've seen noticeable performance improvements by using the following version, which, unlike the standard library's version, can optimise down to a memcpy:

extension UnsafeMutableBufferPointer {

  /// Initializes the buffer’s memory with the given elements.
  ///
  /// When calling the initialize(from:) method on a buffer b, the memory referenced by b must be uninitialized or the Element type must be a trivial type.
  /// After the call, the memory referenced by this buffer up to, but not including, the returned index is initialized.
  /// The buffer must contain sufficient memory to accommodate source.underestimatedCount.
  ///
  /// The returned index is the position of the element in the buffer one past the last element written.
  /// If source contains no elements, the returned index is equal to the buffer’s startIndex.
  /// If source contains an equal or greater number of elements than the buffer can hold, the returned index is equal to the buffer’s endIndex.
  ///
  /// This is like the standard library's `initialize(from:)` method, except that it doesn't return an iterator of remaining elements from the source,
  /// and thus is able to be significantly faster for sources which implement `withContiguousStorageIfAvailable`.
  ///
  @inlinable
  internal func fastInitialize<S>(from source: S) -> Int where S: Sequence, S.Element == Element {
    // UMBP.initialize(from:) is slow with slices. https://bugs.swift.org/browse/SR-14491
    let _bytesWritten = source.withContiguousStorageIfAvailable { srcBuffer -> Int in
      guard let srcAddress = srcBuffer.baseAddress else { return 0 }
      let bytesToWrite = Swift.min(count, srcBuffer.count)
      self.baseAddress?.initialize(from: srcAddress, count: bytesToWrite)
      return bytesToWrite
    }
    if let bytesWritten = _bytesWritten {
      return bytesWritten
    }
    return initialize(from: source).1
  }
}

So yes - I agree with your point that Swift's performance story is quite shaky. You can get performance close to C (minus bounds checking, required for memory safety), but it requires months and months and months and months of hard work, benchmarking, disassembly, and lots of time poking around the standard library's source code. I wouldn't say that it's simple or easy.