[Pitch] OutputSpan

For the append method, would the output span's count always begin at zero, because the initialized prefix isn't available to it?

I can only find utf8 properties on the String, Substring, Character, and Unicode.Scalar types.

If spans over tuples are invalid because of the padding issue, then I think spans over a single element might also be invalid, and therefore CollectionOfOne shouldn't be supported.

The current solution to this is that the finalize method requires you to pass in again the original pointer, which prevents you from entirely reassigning the OutputSpan. (I asked earlier what happens if you don't call "finalize" and I think that wasn't addressed.)

1 Like

I missed that question earlier. In OutputSpan, the deinit deinitializes the count elements in the prefix of the memory. Calling finalize() is the way to circumvent this. In OutputRawSpan there is no deinit.

1 Like

Regarding the naming of the append() operations…

The starting point we have for append() is the RangeReplaceableCollection protocol, which assumes Copyable elements. Of course we expect that the elements are copied, but it is annotated so that we expect the sequence to be consumed:
mutating func append<S: Sequence>(contentsOf newElements: __owned S). (__owned is a precursor of consuming.)

In the generalized Collection's world that supports non-copyable elements, we have to consider 4 possible combinations of copyability and non-copyability. The source container can be Copyable/~Copyable, and the element can be be Copyable/~Copyable. For the specific case of appending a set of elements, this can mean this:

Element: Copyable Element: ~Copyable
Source: Container & Copyable [a] consume or copy source (borrow elements only)
Source: Container & ~Copyable [b] borrow or consume source [c] consume source

The [a] case is like the Sequence situation, where consuming the source is a useful default in order to minimize reference counts. The [b] case's behaviour is different depending upon whether the source should be deinitialized after the operation. A MutableSpan source has a constant number of elements, so its elements must be copied. An OutputSpan source can be deinitialized, and its elements could be moved from the source to the destination. The difference might be representable as a refined protocol, such as a "consumable container", but when using a consumable container the intention might still be to copy the elements without consuming the container. Finally, the [c] case requires the elements to be moved from the source to the destination, also representing a consumed source.

In parallel to this, we also need to consider the case of unsafe memory, whose initialization state is manually managed via UnsafeMutableBufferPointer. UnsafeMutableBufferPointer can be used as a Sequence, and requires a different label or function name in order to be used as a consumed source and deinitialized. In the existing standard library API, this different function name is UnsafeMutableRawBufferPointer.moveInitialize(). The initial pitch used the name moveAppend() for this functionality, but it would be really nice to have the same prefix in order to get good auto-completion hints.

Given the table above (and the paragraph that follows it,) it is tempting to think that we could reuse RangeReplaceableCollection's append(contentsOf:) to represent all of the "consume source" cases for OutputSpan, including Sequence. However, this would break down in particular for UnsafeMutableBufferPointer, since it would make it impossible differentiate between just copying the elements out of it, or moving its elements out (and deinitializing its memory).

I'm arriving at the following names:

extension OutputSpan {
  mutating func append(copying: consuming some Sequence<Element>)
  mutating func append(copying: Span<Element>)
  mutating func append(copying: UnsafeBufferPointer<Element>)
  
  //Eventually:
  mutating func append(copying: borrowing some Container<Element>)
}

extension OutputSpan where Element: ~Copyable {
  mutating func append(consuming: inout OutputSpan<Element>)
  mutating func append(consuming: UnsafeMutableBufferPointer<Element>)
  
  //Eventually:
  mutating func append(consuming: consuming some ConsumableContainer<Element>)
  mutating func append(consuming: inout some RangeReplaceableContainer<Element>)
}
2 Likes

I'm using similar names in the RigidArray/DynamicArray prototypes, but my experience using them has not been great:

  1. I think there is a clear distinction between appending items by consuming a container and appending items by removing the elements from a container. The distinction is particularly important for resource-constrained cases, as the latter allows us to reuse the source container's storage without any heap allocation traffic.

    I've been highlighting this distinction by using a distinct label (moving) for the second case:

    mutating func append<
      C: ConsumableContainer<Element> & ~Copyable & ~Escapable
    >(consuming source: consuming C)
    mutating func append<
      C: RangeReplaceableContainer<Element> & ~Copyable & ~Escapable
    >(moving source: inout C)
    
  2. I find that using the same copying label for both the Sequence-based and the (eventual) Container-based variants is problematic, as types can (and will!) conform to both protocols. To use the same label, we therefore are forced to add an overload to disambiguate the two:

    mutating func append<
      C: Container<Element> & ~Copyable & ~Escapable
    >(copying: borrowing C)
    mutating func append(copying: borrowing some Sequence<Element>)
    mutating func append<
      C: Container<Element> & Sequence<Element>
    >(copying: borrowing C)
    

    This needs to be done for every operation that has a generic argument like this. This is quite unwieldy to do, because we'll want the third overload to forward to one of the other two, but the most practical way to do that is to implement the actual algorithm in an underscored fourth method, and have the relevant public API forward to it. Doing that feels like fighting against Swift. It seems preferable to keep these distinct by using distinct labels.

  3. A similar problem applies to the container we're appending to -- it can (and often will) conform to both the existing RangeReplaceableCollection protocol and its equivalent in the noncopyable domain (let's call it RangeReplaceableContainer). RangeReplaceableCollection already comes with an append(contentsOf:) operation that takes a Sequence -- it seems best not to have both that and a new append(copying:) synonym for the same, especially as we've seen in point 3, the latter causes problems elsewhere.

While using these prototypes, I keep switching back and forth between using the copying and contentsOf labels for the Sequence/Collection case. I prefer the clarity of copying, but as the type author I deeply dislike having to deal with the extra overloads, and as a type client it causes discomfort that I cannot select which algorithm I want to use.

Provided that we do not make language-level changes to make it easier to disambiguate between these flavors, I suggest the following naming scheme:

extension OutputSpan where Element: ~Copyable {
  mutating func append<
    C: ConsumableContainer<Element> & ~Copyable & ~Escapable
  >(consuming: consuming C)
  mutating func append<
    C: RangeReplaceableContainer<Element> & ~Copyable & ~Escapable
  >(moving: inout C)
}

extension OutputSpan /* where Element: Copyable */ {
  mutating func append<
    C: Container<Element> & ~Copyable & ~Escapable
  >(copying: borrowing C)
  mutating func append(contentsOf: some Sequence<Element>)
}

I think the labeling we apply here generalizes well to the standard insert and replaceSubrange methods, and any other generic algorithm that transfers items between container types:

Sequence/Collection Borrowable container Consumable container Range-replaceable container
append(contentsOf:) append(copying:) append(consuming:) append(moving:)
insert(contentsOf:at:) insert(copying:at:) insert(consuming:at:) insert(moving:at:)
replaceSubrange(_:with:) replaceSubrange(_:copying:) replaceSubrange(_:consuming:) replaceSubrange(_:moving:)

Obviously, we aren't shipping Container/RangeReplaceableContainer/ConsumableContainer protocols yet. But we will definitely have something along those lines, and OutputSpan should not make premature decisions about its API that will make it more difficult to integrate them into the language later.

There will be (many) types that support all of these four ways of transferring elements out of them. It seems desirable to allow the calling code to select which flavor it wants to use, every time it performs an operation, and distinguishing their names is a good way to do that.

Note: There is another way to express these operations, where there is but one append/insert/replaceSubrange method, using OutputSpan as the choke point:

protocol RangeReplaceAbleContainer {
  // Append items to a container by directly initializing underlying 
  // storage. `body` is potentially called multiple times to initialize
  // successive chinks of piecewise contiguous storage. It is expected 
  // to fully initialize the output span it is given, every time it is 
  // called.
  mutating func append<E: Error>(
    count: Int, 
    by body: (OutputSpan<Element>) throws(E) -> Void
  ) throws(E) -> Void {...}
}

var items: some RangeReplaceableContainer<Int> = ...
// append the values 0 ..< 1000 to items
var value = 0
items.append(count: 1000) { span in 
  assert(value + span.count <= 1000)
  for i in span.indices {
    span.append(value)
    value += 1
  }
}

While this is extremely useful as a low-level plumbing primitive, I do not believe this would be acceptable as the primary user-facing way to append/insert/replace things in containers -- the generic operations are far more convenient.

Extra twist: dealing with the lack of slicing

Noncopyable containers are not expected to be sliceable, and that induces even more flavors for these. It is quite reasonable to expect that I'll be able to append items 5..<10 in one container onto another:

// Naming scheme is just an example
target.append(copying: 5 ..< 10, in: source)
target.insert(moving: 2 ..< 7, in: source, at: 5)
target.replaceSubrange(4 ..< 8, copying: 3 ..< 5, in: source)

Additionally, our existing RangeReplaceableCollection operations also support inserting the contents of a container into itself. This usually triggers expensive copy-on-write copying, but it does work -- and it does not translate at all to the noncopyable universe. (The Law of Exclusivity disallows passing borrows/inout references of an entity to its own mutating methods.) So to allow us to copy/move items within the same container, we'll also need to invent new operations that are dedicated specifically to that.

target.append(copying: 4 ..< 12) // bad: can become ambiguous if Index == Element
target.insert(copying: 4 ..< 12, at: 2) // bad, for the same reason
target.insert(moving: 4 ..< 12, at: 2) // bad, for the same reason
target.copySubrange(4 ..< 12, to: 2) // okay
target.moveSubrange(4 ..< 12, to: 2) // okay

Note that existing types like Array will certainly want to offer all these same operations, so that we can efficiently append/insert items into them by explicitly copying/moving/consuming items from arbitrary source containers. If we do not make language-level changes, then the only way to do this is to add all these variants, and have them pop up in code completion lists even for people who aren't interested in the borrowing/consuming/moving distinction.

It seems like we do need to provide (at least some of) these new variants, to make Swift a viable language for crucial new use cases that require them.

But how many variants of append are too many? At which point will people start to get overwhelmed with choice?

Typically types already provide several overloads of these operations to implement fast paths for specific argument types; those already appear in API docs and completion lists. But we're looking at multiplying the amount of variants by 4 or more.

Is there anything the language can do to avoid forcing libraries to blow up their member namespaces like this?

11 Likes

The distinction between emptying a container and consuming a container is a good one, and I'd missed that. Given that option, I do agree that there are 4 variants to consider.

It seems like the ideal situation would be able to use overloads that have the same name and label, but differ in how they use the container, whether borrowing, mutating or consuming. We would then have:

extension OutputSpan {
  mutating func append(contentsOf: some Sequence<Element>)
  mutating func append(contentsOf: borrowing some Container<Element>)
  mutating func append(contentsOf: inout some RangeReplaceableContainer<Element>)
  mutating func append(contentsOf: consuming some ConsumableContainer<Element>)
}

Unfortunately the language does not allow us to differentiate between all of these uses. We can and are allowed to distinguish between the inout version and another thanks to the mandatory & sigil. If a type were to conform to both RangeReplaceableContainer and ConsumableContainer, distinguishing between the borrowing and consuming variant would not be possible without a change in the language, such as adding a keyword (or another sigil.)

The library-only solution, then, is to have the four labels:

extension OutputSpan {
  mutating func append(contentsOf: some Sequence<Element>)
  mutating func append(copying: borrowing some Container<Element>)
  mutating func append(moving: inout some RangeReplaceableContainer<Element>)
  mutating func append(consuming: consuming some ConsumableContainer<Element>)
}

I'll change the proposal to this form.

This issue also extends to MutableSpan, where the operation is update rather than append. Unfortunately we didn't account for this in that proposal, but we do have the ability to amend these names, and we should do so alongside OutputSpan in this proposal.

The proposal now clarifies that if finalize isn't called, initialized elements are deinitialized. Does that include the initializedCount prefix that's specified in the OutputSpan initializer?

1 Like

It doesn't. All of the storage that's passed to an OutputSpan instance becomes the responsibility of that OutputSpan until consumed or destroyed.

The thing that's a little awkward is that since you can remove elements from an OutputSpan, not finalizing the span doesn't mean things are returned to the state they were before; it means anything you added past the original capacity will be removed, but other changes will stick. This is more of a question for once we have edit and friends, though.

2 Likes