ArraySlice.capacity: any legitimate use cases?

Ben_Cohen · September 7, 2018, 10:18pm

Now that we have all the language features we need, I'm working on a proposal to eliminate the ArraySlice type and replace it with just a compatibility typealias ArraySlice<T> = Slice<Array<T>> instead. Hopefully this will both simplify the standard library and help make things clearer for users.

To do this, we need to make sure that all the capabilities of ArraySlice come from protocols rather than methods directly on the type. For example, to give Slice<Array> the withUnsafeBufferPointer method, we will need to introduce a ContiguouslyStored protocol. Then we can write extension Slice: ContiguouslyStored where Base: ContiguouslyStored. For the most part, this is all good stuff – people often ask for a way to write generic code over different types that have a withUnsafeBufferPoitner method. But we have to do it this way, too, because you can't write extension Slice where Base == Array because we don't have parameterized extensions yet.

One challenge that has come up is the capacity property. This doesn't exist on any protocols today – just concretely on some times like Array, Set, and ArraySlice.

If we did want to put it on the protocol, the logical one would be RangeReplaceableCollection, which already has a reserveCapacity method. It could have a default implementation that returned count, or maybe zero for non-random-access collections.

But the question is: is capacity actually useful at all? Are there practical use cases, other maybe than debugging and experimentation, where you need to query the capacity in order to make a decision as part of an algorithm? If not, it could probably be deprecated and dropped from ArraySlice entirely (and maybe from Array too).

Anyone got any use cases out there?

hlovatt · September 7, 2018, 10:47pm

Never used it!

Slava_Pestov · September 8, 2018, 5:20am

I agree that capacity is not terribly useful, but I think we should also be careful about breaking extensions of ArraySlice in general. A GitHub search turns up a couple of occurrences, for example https://github.com/griotspeak/Rational/blob/15c02f905cf8bc72b4132c5b55782c8dc2e5c5e4/Carthage/Checkouts/SwiftCheck/Sources/WitnessedArbitrary.swift#L88.

Maybe we can convince @Douglas_Gregor to implement parametrized extensions, even with the syntax -- if the only way to use the feature is via a generic typealias.

Or maybe this is a source break we can live with. Probably in most cases it would be easy to just delete the extension or rewrite it in terms of Slice.

anthonylatsis · September 8, 2018, 8:45am

I don't see the fundamental functional difference between having func<T> .... where Base == Array<T> and extension<T> ... where Base == Array<T> in adding withUnsafeBufferPointer to Slice that would require us to use a protocol (solely for this case). When we implement parametrized extensions, we can simply move the method to an extension, if that is preferred, without breaking source whatsoever.

DevAndArtist · September 8, 2018, 8:56am

Yes please

xwu · September 8, 2018, 9:16am

With ABI stability, that wouldn't be possible, unless I'm mistaken. Moreover, I'd imagine there could be other uses to ContiguouslyStored (or maybe ContiguousStorage?) down the line; it legitimately reflects a characteristic of certain collection types and not others.

Karl · September 8, 2018, 10:41am

Hallelujah!

I think the parameterised extension stuff is a red herring. This protocol would be independently useful.

For example, I have some code which slices frames of streaming bytes. It's important that the frames (and their slices) are all contiguous, since they can then be interpreted as strings by a C regex library. The slicer needs to work with the Unsafe*Pointer types, and also Foundation.Data - the unifying abstraction being that they are both contiguously-stored collections. Additionally, some of the slicing predicates can work more efficiently by rebinding their elements to UInt16 or UInt32, which necessarily requires contiguous storage.

Sometimes the slicer needs to buffer between frames. To achieve that, I also found it useful to add an AnyContiguouslyStoredCollection<Element> type-erased wrapper, in the style of AnyBidirectionalCollection or AnyRandomAccessCollection.

Returning to capacity: I can't think of any use-cases. I'm not even certain what that would even mean for a slice - is it the capacity of the underlying collection?

I would note that Slice<T> inherits the default implementation of reserveCapacity(), which does nothing.

Ben_Cohen · September 8, 2018, 3:59pm

There is a reasonable definition. If the slice is of the end of the collection, then it's the capacity of the underlying collection less the elements missing from the front. That's because if the underlying base is uniquely referenced, the slice can use that spare capacity. Otherwise, it should just be the size of the slice.

That whole code needs rework, as part of this.

Ben_Cohen · September 8, 2018, 4:00pm

It's not a red herring, it's the point of this thread. The details of ContiguouslyStored itself, useful as it is, should be discussed in a different thread.

amosavian · September 8, 2018, 8:32pm

Ben, How about discriminating Data slice from Data itself, as we had for SubString vs String?

DevAndArtist · September 8, 2018, 8:42pm

I wonder if we could deprecate the old Data slices API in favor of the new Slice type. I‘d be in favor of that.

Karl · September 9, 2018, 7:24pm

That assumes that the first element is laid-out at the start of the underlying capacity. Your Collection could still be contiguous while reserving spare capacity at the start for prepending elements.

Basically the only time you need to care about the capacity is when you're inserting a large number of elements across several calls, and you want to pre-allocate the storage for them. If you called replaceSubrange() on a slice and increased the number of elements inside of it, reallocation or not would depend on the underlying collection's capacity.

So... I guess maybe that's what Slice should return, even if some (and likely most) of its capacity is occupied by elements outside of the slice. That also means that, in a generic context, a Collection's count isn't necessarily the occupied capacity.

What I meant by that is that I hope nobody's looking at parameterised extensions as an alternative to a ContiguouslyStored protocol. I have a protocol like that in several projects and would love for it to be part of the standard library.

As for hashing out the details of that protocol? Let's do it. I'm ready when you are

Ben_Cohen · September 9, 2018, 8:00pm

That protocol was an example of functionality that wouldn't need parameterized extensions. It is features of Array that aren’t in a protocol today and might not belong in one in future, such as capacity, that would need parameterized extensions.

Karl · November 16, 2018, 2:33pm

How does this relate to Contiguous Collection Protocols - #17 by Ben_Cohen - will it be a part of that proposal, or will there be another one to make ArraySlice a typealias?