Preparing the iteration ABI for move-only types

We eventually want Swift to be able to support move-only types, and we want the core iteration APIs to be ready to support data structures containing move-only values. Particularly, we want it to be possible for fundamental data structures like Array, Dictionary, and Set to be generalized to allow
move-only types in a future version of Swift. This means, at minimum, that future move-only implementations of these data structures must be able to conform to the Sequence and Collection protocols we ship with Swift 5 for backward compatibility. We should also anticipate the language and
library changes that will be necessary to support move-only types and consider how they'll work when backward-deployed to code compiled with Swift 5. inevitably require additions to the standard library.

Limitations of Sequence iteration with move-only types

As it currently stands, the basic iteration protocol in Swift is the Sequence protocol:

protocol Sequence {
  associatedtype Iterator: IteratorProtocol
  func makeIterator() -> Iterator
}

protocol IteratorProtocol {
  associatedtype Element
  mutating func next() -> Element?
}

and a for loop desugars into code that makes an iterator for the sequence and then pulls values out of the iterator using its next method:

for x in xs {
  <body>
}

// is equivalent to:

do {
  var iterator = xs.makeIterator()
  while let x = iterator.next() {
    <body>
  }
}

Sequence is inadequate for a number of reasons that are already being discussed, but on top of all that ongoing discussion, it's also unusable in its current form with move-only types:

  • The Iterator implementation in practice requires duplicating some or all of the information in the original Sequence, which is not possible if the Sequence cannot be copied without consuming the original sequence.
  • The iteration method next() returns each element by value. This is impossible for sequences whose elements can't be copied, unless the original sequence is consumed by the iteration, since returning by value would require moving those values out of the holding sequence.

As written today, a move-only type cannot conform to the Sequence protocol in most practical cases due to the need for the Iterator to copy information from the Sequence value. At minimum, we must make the makeIterator requirement __consuming in anticipation of move-only types so that a move-
only implementation can satisfy the requirement by consuming the sequence to produce the iterator.

With the current protocol design, a move-only Sequence would only support single-pass consuming iteration, but future versions of Swift could add requirements to the protocol to support multi-pass iteration. At that time, existing copyable implementations of Sequence would be able to automatically
conform to the new requirements with a default implementation written in terms of the existing makeIterator interface, but new move-only implementations may have to implement both interfaces.

Collections and move-only types

The Collection protocol is somewhat better prepared for move-only types, since it provides indexing and subscripting as traversal and access primitives. The index manipulation requirements on Collection like index(after:) are designed to all take the original collection as input in addition to the index being manipulated, and likewise subscripting takes both the collection and index as inputs, which means that most index implementations do not need to duplicate any information from their parent collection. We're working on adopting a coroutine-based model for property and subscript accesses, which
will mean that property and subscript reads can be done as borrows. For move-only collections, we could define an alternative expansion of for loops for collections in terms of indexes and subscripts:

for x in xs {
  <body>
}

// could be expanded to:

do {
  var index = xs.startIndex
  let end = xs.endIndex
  while index != end {
    // Pretend that `with` is a statement that extends an access to a property
    // over multiple statements
    with let x = xs[index] {
      <body>
    }
    index = xs.index(after: index)
  }
}

Since indexes do not need to copy the original collection, and subscripting will be able to yield borrowed values out of the collection without formally copying them, this would be a workable iteration model for move-only collections. Index-based iteration could also be used to extend for loops to support mutating elements on MutableCollections.

On the other hand, index-based iteration is suboptimal in a number of ways. In unspecialized code, it would require three virtual calls per iteration to test, subscript, and advance the index,
compared to one next() call for the current iterator design. There's also more per-iteration validation overhead in subscripting than there would be in an ideal IteratorProtocol implementation, since an iterator
can be written to assume in-bounds access, whereas an isolated subscript must validate the given index every time. We don't strictly need to do anything today for Collection to accommodate future move-only implementations, but we may want to anticipate and mitigate the performance issues if we don't.

The ideal solution: Generators?

Generators are a popular feature in other programming languages that make it easy to define sequences by writing imperative code that yield-s each element of the sequence in order. They would also be a natural fit for the proposed ownership and move-only type model, since a running generator can hold on to
a borrow of a parent sequence while yielding access to its elements without requiring copies. For instance, one could efficiently iterate through an Array of move-only values without requiring per-iteration bounds checks or intermediary copies of the array value:

func generateArray<T>(xs: [T]) yields T {
  xs.withUnsafeBufferPointer {
    var address = $0.startAddress

    for i in 0..<$0.count {
      yield address.unsafelyUnwrapped.pointee
      address += 1
    }
  }
}

A future version of the Sequence protocol could introduce generator requirements that supersede the makeIterator requirement for consuming and borrowing iteration:

protocol Sequence: moveonly {
  /* existing requirements */
  consuming func generateConsuming() yields consumed Element
  func generate() yields Element
}

For compatibility with existing copyable Sequences and Collections, we can provide default implementations of these generator requirements in terms of the existing requirements. We can also provide default implementations for move-only Collections, though move-only Sequences need explicit implementation of the new interface:

extension Sequence: moveonly where Element: moveonly {
  // A default implementation of generateConsuming() can be given
  // in terms of the makeIterator() model
  consuming func generateConsuming() yields consumed Element {
    var iterator = makeIterator()
    while let element = iterator.next() {
      yield element
    }
  }
}

extension Sequence: copyable where Element: copyable {
  // A default implementation of generateConsuming() can be given
  // in terms of the makeIterator() model, but only for copyable types.
  // This is a bit inefficient due to the need to copy the
  // sequence and the elements.
  func generate() yields Element {
    var iterator = makeIterator()
    while let element = iterator.next() {
      yield element
    }
  }  
}

extension Collection: moveonly where Element: moveonly {
  // For collections, we can provide a default implementation of
  // generate() even for move-only implementations in terms of indexing
  // and subscripting.
  //
  // This would be inefficient when the default implementation is used
  // with binaries compiled for Swift 5, due to the need for virtual
  // calls to manipulate indexes and access the subscript. Recompiling
  // code ought to at least get a specialized implementation of
  // `generate`, though.
  func generate() yields Element {
    var index = startIndex
    let end = endIndex
    while index != end {
      yield self[index]
      index = index(after: index)
    }
  }
}

Generators are not likely to happen in the Swift 5 timeframe, both because they're a large new language feature and because getting the efficiency and interactions we want with generators and the ownership model likely itself requires move-only types to model the continuations for suspended generators, but they're a good model for seeing how iteration could work in the future of Swift.

Preparing Collection iteration to be efficient with move-only types

As noted above, we can extend the Collection protocol we have today to be compatible with move-only collections in the future using generators, but the default implementation would be inefficient with binaries not recompiled to specialize the new requirements. We could perhaps mitigate this in the compiler
by favoring the Swift 5 interfaces to expand for loops with types known to have been compiled for Swift 5. On the other hand, we can somewhat anticipate what a generator-based ABI would look like and prepare for it by putting hidden requirements in the Sequence and Collection protocols today:

protocol Sequence {
  /* existing requirements... */

  associatedtype _Generator: _GeneratorProtocol
  func _generate() -> _Generator
  subscript(_advancing generator: inout _Generator) -> Element
}

extension Sequence {
  func _generate() -> _SequenceGenerator<Self> {
    return _SequenceGenerator(iterator: makeIterator())
  }

  subscript(_advancing generator: inout _SequenceGenerator<Self>) -> Element {
    yield generator.value!
    generator.value = generator.iterator.next()
  }
}

protocol _GeneratorProtocol {
  var done: Bool { get }
}

struct _SequenceGenerator<S: Sequence>: _GeneratorProtocol {
  var iterator: S.Iterator
  var value: S.Element?

  var done: Bool { return value == nil }

  init(iterator: S.Iterator) {
    self.iterator = iterator
    self.value = self.iterator.next()
  }
}

protocol Collection: Sequence
    where _Generator == _CollectionGenerator<Self> {
  /* existing requirements... */

  func _generate() -> _CollectionGenerator<Self>
  subscript(_advancing generator: inout _CollectionGenerator<Self>) -> Element
}

extension Collection {
  func _generate() -> _CollectionGenerator<Self> {
    return _CollectionGenerator(done: startIndex == endIndex,
                                start: startIndex,
                                end: endIndex)
  }

  subscript(_advancing generator: inout _CollectionGenerator<Self>) -> Element {
    read {
      precondition(!generator.done)
      let cur = generator.start
      yield self[cur]
      generator.start = index(after: generator.start)
      generator.done = generator.start == generator.end
    }
    modify {
      precondition(!generator.done)
      let cur = generator.start
      yield &self[cur]
      generator.start = index(after: generator.start)
      generator.done = generator.start == generator.end
    }
  }
}

struct _CollectionGenerator<C: Collection> {
  var done: Bool
  var start, end: C.Index
}

By putting the per-iteration logic for testing, advancing, and subscripting indexes behind the Sequence vtable, where it can be specialized for each implementation, we can reduce the collection-based for loop to one virtual call per iteration:

for x in xs {
  <body>
}

// could be expanded to:

do {
  var generator = xs._generate()
  while !generator.done {
    // Pretend that `with` is a statement that extends an access to a property
    // over multiple statements
    with let x = xs[_advancing: &generator] {
      <body>
    }
  }
}

When we add move-only types and generators, a default generator implementation could then be added to Collection on top of these existing entry points as well with less overhead.

Summary

The one thing we must do for future move-only data structures to be compatible with today's protocols is make the Sequence.makeIterator() consuming, since most implementations need to carry state from the sequence value into the iterator value. With that done, we can introduce new requirements to future versions of the Sequence and Collection protocols to better support move-only types, though the default implementations of those requirements will be inefficient when applied to existing binaries. We may want to consider mitigating that future inefficiency by adding private requirements to Sequence and Collection today.

19 Likes

Would any of this look different in @dabrahams's hypothetical world where Sequence is just a convenient way of defining a custom Collection? Perhaps that is too speculative to be relevant, though.

I don’t think so. If makeIterator becomes a requirement on Collection, it still would need to be made consuming in order to be implementable by move-only collections. The makeIterator interface would only work as a shorthand for defining copyable collections, since it’s inherently a consuming interface, whereas indexing and subscripting are borrowing.

cc @John_McCall, @Ben_Cohen

I did a pass through the protocols that Array, Dictionary, and Set conform to, and tried tagging the requirements as __consuming where that would be necessary for move-only implementations:

https://github.com/apple/swift/pull/16894

In the process, I found a couple of potentially problematic protocol requirements that would not be implementable by move-only implementations:

  • Collection.first is a property requirement of type Element?. For most collection implementations that store non-optional Elements, this wouldn't be implementable, since you'd somehow have to form a temporary Optional<Element> value without modifying the collection or copying the element out of it.
  • SetAlgebra.insert has the signature mutating func insert(_ newMember: Element) -> (inserted: Bool, memberAfterInsert: Element), where the return tuple contains either the new member inserted, or the old member already in the set. This is not implementable for a move-only element since the value would have to be copied out of the set to be returned.

There are a number of things we could still do:

  • Do nothing now. When move-only types get added in future Swift, we can constrain the existing requirements to require Element: Copyable, and add new requirements that work with move-only types. This would mean we'd need to implement "constrained requirements" as a new generics feature to support move-only types, though.
  • Remove these requirements from the protocols. Collection.first is a customization point with a default implementation, so we could remove the protocol requirement without breaking source compatibility, possibly at the cost of some efficiency for some implementations. SetAlgebra.insert is however a core part of the set API, so we would need to add new APIs to replace it. This would be source-breaking, though, and we have very little time to introduce a new API and go through a deprecation cycle on the old API before ABI stability.
  • We can special-case the handling of borrowed Optional<T> so that it can refer to a borrowed T in place. This would be useful not only for first, but for dictionary subscripting and similar APIs that want to provide access to a value only when it exists (such as a replacement for the behavior of SetAlgebra.insert conditionally returning the existing equivalent value in a set if one exists). I know John really wants to do this, but it's complicated… If we weren't able to implement special borrowed representations in time for Swift 5's ABI, we could still maybe introduce them later, and introduce duplicate entry points for the old unspecialized ABI (which would only need to accommodate copyable types, since that's all that exists in Swift 5).

Do we have any other options?

2 Likes

.first is one of the dubious customization points that probably should be dropped.

Would it make a difference if it were a set as well as a get computed property? That's been considered in the past as separately useful. Then maybe getting it for a move-only collection could also use the set part to consume the collection.

I guess this would need to be converted to return Element?, with nil for nothing displaced.

1 Like

Removing the customization point would at least make dealing with the modeling problem retroactively a little easier. Making it a get/set property would not help much, though, since you still ought to be able to get the first element by borrowing from a read-only borrowed collection.

AIUI, insert does not displace the existing element if one exists; it instead returns the existing value. If this were a Collection, you could have an alternative API for insert that returns the Index at which the new element was inserted or an existing element already lives, though SetAlgebra does not appear to have an index concept.

I guess another question is whether SetAlgebra needs to accommodate move-only types. If it maintains its implicit copying requirement, then the existing API isn't a long-term compatibility problem.

Ideally SetAlgebra could model non-enumerable sets, such as a set defined by a closures. This kind of set couldn't have an index type.

Unfortunately, that doesn't work in practice though anyway, because SetAlgebra defines a handful of requirements that are impossible to define for that kind of intentional set (e.g. isSubset).

@huon notes that a consuming randomElement on Collection would not be particularly useful. Would it make sense to change the protocol requirement to be randomIndex, producing a random valid, non-end Index for the collection, and make randomElement be an extension method instead?

3 Likes

The randomIndex approach seems like a good work-around, but that doesn't seem like can't work for prefix/suffix, etc. which also seem somewhat surprising for the default, obvious name "prefix" (etc.) to be consuming.

I think that the best API for a move-only prefix / suffix would be something like:

__consuming func split(after: Index) -> (prefix: SubSequence, suffix: SubSequence)

and / or a variant like split(before: Index) etc.

This way even after the original collection is consumed, all of the move-only elements are still available.

2 Likes

We've come up with some other design problems that could impede forward compatibility between Collection and future move-only types. There are a number of operations that produce wrapper collections, like indices and lazy, and some of these are protocol requirements. For example, the default implementation of indices in its current form needs to take a copy of the collection in order to be able to manipulate the collection's indices, something like this:

extension Collection {
    var indices: IndicesCollection<Self> {
         return IndicesCollection(base: self) // need to copy or consume self here
    }
}

struct IndicesCollection<Base: Collection>: Collection {
    var base: Base
}

But this implementation would not be valid for a move-only collection without consuming the original collection. However, it wouldn't be particularly useful for indices to be the final thing you can do with a collection, since you need the original collection to do anything with the index values you get from indices. Property implementations can now yield borrowed values using accessor coroutines, which are naturally scoped to the parent value, so we could conceivably have a different default implementation of indices for move-only types:

extension Collection where Self: !Copyable { // strawman syntax
    var indices: IndicesCollectionMoveOnly<Self> {
        withUnsafePointer(to: self) {
            yield IndicesCollectionMoveOnly(base: $0) // we can safely vend the pointer here, since it's moveonly and borrowed so can't escape
        }
    }
}
struct IndicesCollectionMoveOnly<Base: Collection & !Copyable>: Collection & !Copyable {
    var base: UnsafePointer<Base>
}

which would mean that collection types supporting both move-only and copyable elements need to conform two different ways for move-only and copyable collections, which is not great for generic code that would want to work over all instances of such a type. This problem affects every wrapper collection to some degree, but it's a forward compatibility concern specifically for indices and subscript(Range<Index>) -> SubSequence because these are requirements of the Collection protocol. Protocols can introduce new requirements with default implementations in new library versions, so there are a couple of ways we could deal with the existing requirement in the present time:

  • Leave the protocol requirements as-is, and in the future, make them a "conditional requirement" only for copyable types. @Douglas_Gregor has some philosophical objections to any kind of "conditional requirement" feature, but because copyability is a fundamental property of types and not a general protocol conformance that can be added post-hoc, I think that at least makes the feature easier to introduce in this limited case.
  • Make theses requirements __consuming for now. Like I said, this would make them effectively useless for move-only types, but they would at least be unconditionally implementable, so we would be less likely to be blocked by implementing other language features in order to implement move-only types.

In either situation, we ought to still be able to introduce new, more general requirements that work with all types in a future Swift that supported move-only types, once that feature is designed and implemented.

There's another thorny issue having to do with the mutating subscript(Range<Index>) -> SubSequence requirement that MutableCollection and RangeReplaceableCollection introduce. Ideally, mutation through a slice like this would be done in-place when possible, but because Slice and other typical implementations of SubSequence must wrap the original collection's buffer, copy-on-write triggers because there are two independent references. Even with move-only types, this is tricky, because an exclusive, mutating borrow could allow the borrowed slice to be swapped with another move-only slice value unrelated to the original value, unless we had some way of marking mutating borrows as "noescape" or explicitly lifetime-scoped to their parent values. Beyond just basic support for move-only types, It is likely that providing a good experience with strong performance guarantees for divide-and-conquer sorts of mutating algorithms on collections at all may need a new protocol design, perhaps taking advantage of move-only types to enforce safe partitioning in the manner of Rust's Rayon library.

If we were to try to make in-place modification through slices possible at least in limited situations today, @John_McCall has proposed a number of possible solutions. We could say that the Slice that you receive from a range subscript operation starts out with a non-owning reference, but that any copy or move of that initial slice causes the reference to become strong. A similar technique might in theory also address the borrow/consume issue with nonmutating wrappers like indices; if the indices wrapper type could start out with a non-owning reference to its parent collection, but upgrade that to an independent copy if the wrapper is copied, then one implementation using that one type might be sharable among move-only and copyable types (since you would only ever be in the borrowed state with a move-only type, whereas the promotion from borrow to independent copy when copying or moving the value provides the necessary generalization for copyable values).

Although an interesting theoretical design, this would be a fairly big fundamental change to Swift's implementation model, in which copies currently do not alter the state of copied values in any but very controlled ways (like reference counting), so it's unlikely that we'd be able to implement this in the time we have remaining for Swift 5. Furthermore, even with all that work, we could still end up with an unsatisfying answer to the in-place slicing problem, since the in-place performance guarantee would still come with many caveats: resizing, copying, or other operations on a slice could force the reference to become strong unexpectedly and defeat in-place modification, leaving users still wanting a stricter in-place modification interface with stronger guarantees, which we would only be able to deliver in the future with the support of move-only types. It therefore isn't clear that it's worth expending large effort now for a design that'd still be a compromise. Our best short-term effort might best be spent only ensuring that our existing protocol designs don't actively impede move-only types from conforming in the future.

5 Likes

Just wanted to write this down: Joe, @Ben_Cohen, and I had a bit further discussion on this yesterday, which led to the idea that a property like reversed really wants to be a sort of "forwarding" property: if you have a borrowed collection, you can get a borrowed reversed collection; if you have a mutable (inout) collection, you can get a mutable reversed collection; if you have a collection you own, you can get a reversed collection you own. Or, in code:

func borrow<Element: !Copyable>(_ items: __shared [Element]) {
  print(items.reversed.first) // Okay, we're just borrowing "x".
}

func mutate<Element: DefaultInit & !Copyable>(_ items: inout [Element]) {
  let lastIndex = items.reversed.startIndex
  items.reversed[lastIndex] = Element() // Not implemented today, but not totally unreasonable.
}

func consume<Element: Closeable & !Copyable>(_ items: __owned [Element) {
  for __owned item : items.reversed {
    item.close() // Destroys the element, so we must be destroying the entire Array.
  }
}

As Joe noted above, you can do this kind of thing by yielding out of a property implementation instead of just returning a value, as long as the value you yield isn't copyable. This leads to the idea of properties potentially having three accessors to support move-only types: read, which borrows self; modify, which treats self as mutating / inout; and __consuming get, which can return an independent value but consumes self in the process. And since these are independent entry points, the ones we don't use today can be added as additional protocol requirements for Collection in the future in a backwards-compatible way, as long as we're very careful about how the default implementation works / when it can get called. (The three of us worked out that either of Joe's bullet-point proposals above can support this.)

By the way, with copyable types, __consuming get can be implemented as read-then-copy, and conversely read can be implemented as copy-then-get. That's how the default implementations work. Move-only types are the case where having all three really matters.

P.S. Note that we can still have nonmutating modify or nonmutating set for things like UnsafeMutablePointer.pointee, and mutating get or mutating read for things like lazy properties on structs. There's no __consuming set because setters can't return anything, and so you've consumed the value for nothing.

P.P.S. This isn't sufficient to solve the Slice problem on its own, by the way, because you want Slices to be mutated in-place when possible, but also for it to be okay to re-assign a Slice in a RangeReplaceableCollection like Array without touching the rest of the elements.

4 Likes