Pitch: Introduce `for borrow` and `for inout` to provide non-copying collection iteration

beccadax · January 13, 2023, 7:38pm

Hello Evolution friends,

As part of our ongoing work on non-copyable types—not to mention a few other things—I'd like to present a pitch which adds for borrow and for inout loops to Swift. for borrow is mostly useful for noncopyable types and other performance-critical scenarios, but for inout actually fills a significant gap in the language because mutating a collection with a standard for loop is surprisingly hard to do correctly. Here's an excerpt:

Proposed solution

We propose extending SE-NNNN's introduction of borrow and inout bindings by adding support for inout and borrow bindings to for loops:
// sum the contents of a C++ vector that's too expensive to copy
var sum = 0
for borrow n in cpp_number_vector {
    sum += n
}

// square all `numbers`
for inout n in &numbers {
    n *= n
}
These features require the loop's collection to be borrowable or inout-able and to conform to Collection or MutableCollection, respectively. They use an index-based strategy to access the elements, which desugars to something like this:
// sum the contents of a C++ vector that's too expensive to copy
var sum = 0

do {
    borrow $collection = cpp_number_vector
    var $i = $collection.startIndex
    while $i < $collection.endIndex {
        defer { $collection.formIndex(after: &$i) }
        borrow n = $collection[$i]
        
        // Loop body goes here
        sum += n
    }
}

// square all `numbers`
do {
    inout $collection = &numbers
    var $i = $collection.startIndex
    while $i < $collection.endIndex {
        defer { $collection.formIndex(after: &$i) }
        inout n = &$collection[$i]
        
        // Loop body goes here
        n *= n
    }
}

The full pitch is available as a gist. I look forward to your feedback!

Philippe_Hausler · January 13, 2023, 7:44pm

would it make sense to borrow from AsyncSequence iteration too? e.g. for try await borrow item in items?

John_McCall · January 13, 2023, 7:46pm

I think a borrowed for loop should be the default, and you should have to specifically request a consuming for loop.

ksluder · January 13, 2023, 7:48pm

Any chance we can make the standard for loop “just work” when iterating over a var that conforms to MutableCollection? The transformation seems to be generally applicable, no?

Edit: I see this is covered in the full gist.

Ben_Cohen · January 13, 2023, 7:51pm

Probably, but that probably would need to be a Swift 6-gated change, no? Like it or not there’s lots of mutation of the iterated thing going on out there…

I guess this proposal could lock that in now. But a separate proposal is probably more appropriate.

beccadax · January 13, 2023, 7:54pm

That would be affected by an even more severe form of the problems described in the “Use the new Collection iteration strategy more aggressively” alternative. Basically, there are both tighter exclusivity requirements and (for certain types) performance regressions to worry about if you do that, and that’s probably a deal-breaker.

Joe_Groff · January 13, 2023, 7:57pm

Even if we do adopt the borrowed iteration mechanism as the default for loop behavior in some cases, I think we'd still want to have some difference in behavior around access scopes between an unannotated for loop and an explicitly annotated for borrow loop. To me it makes sense that for borrow x in object.ivar might perform a prolonged read access on object.ivar, since I'm asking it to borrow, but it might be surprising if for x in object.ivar didn't semantically copy object.ivar, allowing other code to write to object.ivar without affecting the iteration, even if it uses the borrowed iteration mechanism on the copy.

tbkka · January 13, 2023, 8:00pm

I've gone back and forth on whether we should change the default for..in loop to use borrowing.

Were we starting Swift over from scratch, I probably would make a lot of things borrow that are currently copies. And the idea that a language change at that level might magically speed up a large amount of existing code is tantalizing. But defensive copying is pretty ingrained in the language at this point. And copying does reduce the likelihood of exclusivity problems, so it seems like a natural default for most programmers (who don't have special performance concerns) that the basic for..in should copy the collection and the elements.

The big remaining question I have is whether we should use index-based iteration for for borrow..in and for inout..in or whether we should have a noncopyable borrowing iterator protocol to use for these cases. Iterators have an advantage that they can carry state through the iteration, reducing the need for validity checks. But I also feel there should be a high bar for adding Yet Another Iteration Protocol.

Jumhyn · January 13, 2023, 8:06pm

At a level above just Swift, I personally consider mutating the collection you're iterating over to be a pattern that is confusing enough to just outright refuse to compile. IMO all of:

It's undefined behavior
It crashes at runtime
It doesn't affect the iteration
It affects the iteration as though you were just indexing into the collection

are all surprising in their own ways and it should probably just be banned without the user somehow specifying what they want to happen, e.g. by writing for x in copy object.ivar or something.

tbkka · January 13, 2023, 8:10pm

I don't think so. It doesn't intuitively make a lot of sense to "borrow" something from a non-replayable sequence. (The sequence can't do anything with the element when you return it other than destroy it.)

I feel like we do need more clarity around the relationship between "sequences", "iterators", and "collections" and how to make them work efficiently for non-replayable data sources (e.g., I/O or random number generators that yield ownership as they're traversed) vs. stored collections (that retain ownership). In my mind, for borrow and for inout are specifically for the latter case.

tbkka · January 13, 2023, 8:12pm

For for borrow and for inout, we could make the loop be an "access" of the collection for exclusivity purposes. That would prohibit mutating the collection while iterating it.

While I think you may be right about plain for..in loops as well, that would likely break a lot of code so would at a minimum require some very careful transitional planning.

Jumhyn · January 13, 2023, 8:14pm

Yeah, definitely. Just trying to express that the source break would be the primary concern for me rather than it being surprising if the exclusivity were enforced at compile time.

ksluder · January 13, 2023, 8:15pm

The situation @Joe_Groff brings up is what happens if you concurrently mutate object.ivar from outside the iteration:

class C: Sendable {
  private var _strings: [String]
  private var lock = Lock()
  var strings: [String] {
    get { lock.withLock { _strings } }
    set { lock.withLock { _strings = newValue } }
}

let object = C()
object.strings = ["hello", "world"]
Task.detached {
  object.strings = ["mutated"]
}
Task.detached {
  for string in object.strings {
    print(string)
  }
}

This program is well-formed, and either prints hello world or mutated.

John_McCall · January 13, 2023, 8:15pm

So, with borrowed iteration and copyable types, there are actually two independent questions here: whether for loops should borrow the collection by the default, and whether they should use an iteration strategy that allows them to borrow elements. For value-semantics collections — i.e. almost all of them — the source compatibility issues turn solely on whether the collection is borrowed: even if you have an owned copy of the collection, you can still get much better performance guarantees by doing a borrowing iteration.

I feel like that idea — making iteration default to borrowing elements when iterating a Collection, but still generating an owned copy of the collection — is a pretty good compromise position that preserves source compatibility while also extending pretty naturally in two directions:

Clients who want to eliminate the copy of the collection can always just use the borrow operator in the collection expression.
It's completely reasonable to default to borrowing move-only collections. In fact, this would be the same default evaluation rule as when values are passed as borrowed parameters: move-only values are borrowed, copyable values are copied and then borrowed.

ksluder · January 13, 2023, 8:17pm

I think it’s worth stating explicitly that your key insight seems to be that it’s safe for the loop to do whatever it wants with a copy of the collection it creates.

John_McCall · January 13, 2023, 8:21pm

Right, the place I feel we want to end up here for Collection is calling a new generator-function requirement that yields borrowed values. We can make the default implementation of that generator use index-based iteration, but obviously in a lot of cases we can do better than that in order to eliminate validity checks during the iteration. Of course, that requires us to add generator functions; until we have that, I think it's a reasonable short-term approach to just inline an index-based iteration.

The biggest risk here with changing the default behavior is that there could be a semantics break if the collection type has an incorrect conformance to Collection/Sequence.

Paul_Cantrell · January 13, 2023, 8:22pm

I love the proposed for inout functionality! Swift structs have a downside:

They make awkward or impossible many local refactorings that are easy with traditional reference-based OOP.
They create verbose repetition of subexpressions: if foo.bar[baz].zot.x < 0 { foo.bar[baz].zot.x = -foo.bar[baz].zot.x }
They lead to index-juggling that defeats the purpose of for-each loops and similar newer-than-C abstractions.

This proposal plus inout local variables stands to fix all of that. Hooray!

In addition to the discussion in the gist, I’d naively expect for var x to copy the collection values, so that assigning to x in the loop body does not affect the referenced collection. That’s analogous to how var works elsewhere.

I continue to be nervous about the terminology inout in its broadening usage, but can’t think of a better name.

fclout · January 13, 2023, 8:24pm

Completely in favor of this, with a note that the syntax we use for the binding introducer (for inout foo, for borrow foo, etc) should match what is decided in borrow and inout declaration keywords.

As far as whether the default for is borrowing or not, IMO, Swift should try to borrow if it can prove that exclusivity rules are upheld, and it shouldn't otherwise, and developers can explicitly use for borrow foo if they want a diagnostic when that wouldn't be possible or if they want to take the risk that dynamic rules are still upheld.

Paul_Cantrell · January 13, 2023, 8:26pm

To elaborate on that slightly:

inout makes sense to me in a function: the function is metaphorically a box, or maybe a room with a door, and the value both goes in and out of it. We already use that “function is a contained space” metaphor with other terms: “input / output,” “pass a value in to a function,” “return from a function” (as if it is a place one visits), etc. But in an inout local var or loop, what is the value metaphorically going in and out of? The scope, I guess? The variable? It seems metaphorically murky. It’s not a term I’d immediately intuit if I encountered it in the wild without prior knowledge, and it’s one I’d find moderately awkward to explain to students.

As I said, I can’t think of a better word that means “reference that points back to a value whose lifecycle is not tied to this particular variable declaration,” so I can live with it! But if there are any brilliant ideas out there, now is the time.

Jumhyn · January 13, 2023, 8:28pm

It might also be nice with all the related ownership features if there were a way to get a diagnostic (and a way to silence it) whenever the compiler cannot guarantee static exclusivity and has to fall back to dynamic enforcement. Perhaps borrow would always guarantee static exclusivity and borrow! can be used to allow dynamic exclusivity checks?