Pitch: Introduce `for borrow` and `for inout` to provide non-copying collection iteration

John_McCall · January 13, 2023, 8:35pm

I think the obvious keyword is mutating, e.g. for mutating elt in &collection, mutating foo = &x.property, etc. It would also work as a replacement for inout in parameters if the Language Workgroup ever decides that it's worth the trouble to deprecate that.

Jumhyn · January 13, 2023, 8:35pm

I share this concern about extending the use of inout to other positions, but I think it's probably safe to say that the keywords we end up using with for-in loops will mirror the keywords we choose for the general declarations so probably makes sense to litigate this aspect over in the other thread.

Nevin · January 13, 2023, 8:39pm

This is a bit off-topic, but one thing I have often wanted is the ability to create “identifier aliases” like so:

alias x = foo.bar[baz].zot.x
if x < 0 { x = -x }

It’s not a copy, not a borrow, not a value, not a reference, it’s literally just an alias. Another name for something. An alternative spelling.

tbkka · January 13, 2023, 8:54pm

The precise keywords here are obviously not final until the Language Work Group weighs in.

A lot of alternatives have been discussed as part of the thread on the borrow and inout local declarations: Pitch: `borrow` and `inout` declaration keywords (My personal favorite is some variation of "mutate" or "mutating", but I trust the LWG to make a good decision and to make sure that aligns across all the related language features.)

The "as if rule" allows the optimizer to make any such changes that provide the same visible behavior. Over time, I'm certain the optimizer folks will be looking at how to take advantage of borrowing whenever it makes things faster without breaking the behavior. This is one reason I'm happy to wait on possibly changing the behavior of plain for..in until we get a little more experience with the backing machinery.

The other thread mentioned above is proposing a way to do exactly this (just replace "alias" with "inout" in your example to match that other pitch). The various rules about it being a "borrowing reference" are really nothing more than a particular set of compiler checks and constraints which ensure that such "aliases" are actually safe. In fact, an early draft of that proposal used the word "alias" here instead of "inout", and it's still one of the possible spellings being considered.

JuneBash · January 13, 2023, 8:54pm

I'm very excited about this proposal, as I was with the related declarations proposal. I share the concerns I did there, that inout and borrow as declaration keywords aren't as clear as they could be, and I would prefer something like the following:

var collection = SomeCollection()

for element in collection {
  // immutable copy of element in collection
}
for var element in collection {
  // mutable copy of element
}
for borrow element in &collection {
  // immutable reference to collection element
}
for borrow var element in &collection {
  // mutable reference to collection element
}

Given the feedback in the other thread, I suspect this is a non-starter, but I wanted to put it out there regardless.

tbkka · January 13, 2023, 9:00pm

Oh, yes, of course, generator-based iteration does seem like the ideal solution.

And you make a good point that when we do that, we can have a default implementation of that generator-based protocol that mimics the index-based iteration we're proposing to introduce now. That answers a lot of my concerns about whether this short-term solution might become problematic in the long term.

rauhul · January 13, 2023, 9:03pm

Just chiming in to say I think this model is easier for me to read and I think easier to teach

lorentey · January 13, 2023, 9:17pm

This is good!

My one worry is that forcing iteration to use indices is going to add unacceptable overhead.

A side effect of the pitch is that it eliminates a crucial Collection customization point for forward iteration: the possibility to customize the Iterator type. It proposes to instead always use the (non-consuming equivalent of) IndexingIterator.

IndexingIterator tends to perform measurably/significantly worse than a custom iterator type for all but the simplest data structures. (I.e., basically anything that isn't just a contiguous array -- and even for something "simple" like Array, IndexingIterator only works as well as it does because of relatively fragile optimizations, heroically hand-tuned over the years. None of these optimizations translate to, say, collection types with non-inlinable members.)

The problem is that index validity isn't modeled in the type system -- it needs to be verified at runtime. Indices are effectively pointers into the internal structure of the collection value, and they are inherently fragile -- the slightest structural mutation can invalidate them. However, nothing prevents us from continuing to use an invalidated index, or even to apply a valid index from one collection to some completely unrelated other collection instance. Therefore, Collection operations that take indices must always treat them with suspicion, and carefully check validity every single time -- otherwise they risk hazards such as out of bounds accesses.

This means that a for-in loop that relies on indexing will do completely unnecessary index validation work, repeated multiple times during each iteration -- subscript[i] and formIndex(after: &i) will both trigger such checks, even if (as I hope) the language will reject attempts to independently mutate the container during these new-style for borrow and for inout loops.

In contrast, the current (consuming) iterators can safely assume that the underlying container will not be mutated while they are active, so they can shortcut all of these checks.

Beyond eliminating validation steps, we also have the freedom to design iterators that include additional information that optimizes the task at hand but that may not be appropriate to include in a general-purpose index. For instance, iterators in tree-based collections sometimes include direct pointers to every node on the current path, while indices often only carry a single pointer to the "current" node (if that -- node pointers are rather tricky to reliably validate). Or, in the case of something like Set, the iterator can carry state that is cheaper to update than recreating it from scratch in each iteration.

To give a specific example:

let n = 1_000_000
var set = Set(0 ..< n)

let clock = ContinuousClock()
let d = clock.measure {
    var c = 0
#if USE_CUSTOM_ITERATOR
    var it = set.makeIterator()
    while let v = it.next() {
        c &+= v
    }
#else
    var i = set.startIndex
    while i < set.endIndex {
        c &+= set[i]
        set.formIndex(after: &i)
    }
#endif
    precondition(c == n &* (n - 1) / 2)
}
print(d)

$ swiftc -O foo.swift && ./foo
0.003566333 seconds
$ swiftc -DUSE_CUSTOM_ITERATOR -O foo.swift && ./foo
0.001064959 seconds

Even for something simple like Set, the custom iterator is more than three times faster than equivalent code that's using indices. (This gap feels suspiciously large, to be honest, but even if some of it is due to us not spending enough effort on optimizing performance, it still demonstrates how much easier it is to write a fast iterator than to optimize indices.)

I would therefore be sad if we lost this customization hook. This would be all the more painful if we were planning to switch the default for in syntax to mean a for borrow even for copyable collections/elements -- we'd be inducing a performance regression.

(I expect for borrow would make up a large part of this gap if the Set's Element was a nontrivial type -- not having to retain strong references will be a significant win. But I don't expect for borrow will do much for trivial types.)

As I understand it, the new for borrow/for inout constructs would inherently prevent independent mutations of the underlying container -- this time as a proper, language-level restriction. Can we somehow use this guarantee to introduce a new, non-consuming iterator construct?

Joe_Groff · January 13, 2023, 9:22pm

In the fullness of time, we should be able to—either using generator coroutines like John mentioned above, and/or augmenting Sequence with new requirements for a lifetime-bounded iterator type that can statically rely on the original sequence being borrowed for the duration of the iterator's lifetime. Any new requirement like that would need to have a default implementation written in terms of existing requirements, though, and it seems to me that we'd need to use indexes to get the right semantics for those default implementations in terms of existing collections today. So I think there's a path forward to a better borrowing iteration model in the future even if we accept this proposal today.

David_Sweeris · January 13, 2023, 9:39pm

A strong +1 from me on the feature, although I think I might agree both with @Paul_Cantrell’s concerns WRT the inout spelling and with @John_McCall that mutating is the obvious alternative.

lorentey · January 13, 2023, 9:47pm

That's fair.

Introducing the customization hook in a future update would run into availability headaches: for-in loops would need to look at the deployment target and unconditionally revert to using indexing if it's too low. This could be surprising, but it's still much better than keeping things as is.

To try to bridge the gap, perhaps we could also consider adding new unsafe-unchecked Collection requirements to increment & dereference an index, where the caller is fully responsible for its validity. (Then again, that could lead to folks unwisely deciding to use those by default, and it would still preclude using a custom state for iteration.)

Dante-Broggi · January 13, 2023, 9:58pm

I think I like inout for for inout loops, even though the metaphor is more "out-in", it takes a value out of the collection and them puts it back in, but "out-in" looks wrong and inout is already a keyword which is close enough.

1-877-547-7272 · January 13, 2023, 10:52pm

+1, this seems like a great change! It's a bit unclear how this change will interact with result builders though.

beccadax · January 13, 2023, 11:39pm

I've made minor revisions to the pitch:

Added @ksluder's "automatic inout" suggestion to the alternatives
Added a clarification about the supposed use of defer prompted by a discussion on Mastodon
Added a note to the Swift 6 alternative pointing out that we could change the default in a separate proposal
Added a note to the section about borrowing/mutating iterators about library evolution limitations
Moved some of the technical language from the motivation section into detailed design instead
Expanded motivation section's discussion of the existing copying behavior
Linked to pitch thread, now that it's up
Prose fixes and formatting improvements

I haven't attempted to fully address some of the bigger discussions in this thread that don't seem to have settled yet:

Copying a Collection and borrowing its elements
Borrowing/mutating iterators
Unsafe indexing of Collections

I very much agree. Whatever for does here should match if and switch and so on.

beccadax · January 14, 2023, 12:06am

This would be complicated to implement today because the code fragments the for loop uses to iterate over its input are generated very early, in CSGen, and the type of the collection you're iterating over is not yet known at that point. (That used to be SILGen's responsibility, but Pavel moved it into Sema recently, I believe because it made it easier to support iterating over Collection existentials.) That seems fixable, though, especially if we can defer it until non-copyable types are actually a thing.

Unfortunately, we can't make MutableCollection refine this generator-based protocol, even if we provide a default implementation. I would love to find a good way around that issue.

Nevin:

This is a bit off-topic, but one thing I have often wanted is the ability to create “identifier aliases” like so:
alias x = foo.bar[baz].zot.x
if x < 0 { x = -x }
It’s not a copy, not a borrow, not a value, not a reference, it’s literally just an alias. Another name for something. An alternative spelling.

Did you know that Swift supports computed local variables?

var x: TypeOfX {
    get { foo.bar[baz].zot.x }
    set { foo.bar[baz].zot.x = newValue }
}

I don't think it would do anything different from a normal for loop—the results of each statement in the body would be passed to buildBlock(_:), and then the results of all the buildBlock(_:) calls would be passed as an array to buildArray(_:).

tbkka · January 14, 2023, 1:51am

I was thinking of it the other way around: We would have a default implementation of the generator-based protocol that could wrap an arbitrary MutableCollection. Then we could use index iteration (over MutableCollection) in the near term and migrate to the generator-based protocol in the future. That future implementation would use the generator-based protocol always -- either a custom implementation or the default implementation wrapping MutableCollection. Future collections could provide either one and still work correctly -- in particular, it would be possible for a library author to implement only the generator API and get for-loop support without having to implement all of MutableCollection.

xwu · January 14, 2023, 2:06am

This change, by the way, is breaking existing in-the-wild user code in certain corner cases—I'm trying to locate the extant bug but it's not coming up at the moment; by memory it has something to do with types where the protocol-conforming makeIterator overload isn't utterable on the instance without casting to an existential box.

Besides that it's a less-than-ideal state of affairs, the related point for this thread is that we should try to ensure that the generated code for for borrow...in, etc. (which by necessity may have to differ from that for plain for...in) doesn't further result in additional use cases where something is iterable with one kind of loop but not another.

Karl · January 14, 2023, 2:22pm

I very much agree with this sentiment. The features themselves are fine; I don't have much/anything to add to the discussion WRT to functionality, I don't think.

But I am slightly concerned about the syntax. Aesthetics and verbosity are factors, but more importantly, each of these concepts needs to feel like part of one coherent ownership story - if you know how one thing works, and you need something a bit different, you should be able to reasonably guess how that other thing is spelled. I'm not sure that the proposed keywords feel connected enough, and I worry that people won't pick it up as easily or that it won't be very intuitive for them.

If I have an algorithm which uses a for loop, but now I want to accept collections of non-copyable elements, for borrow is not an entirely obvious leap, IMO. If it then turns out I need to mutate those elements, going from there to for inout seems even less obvious.

There have been several interesting ideas for alternative syntax in other threads, such as from @TeamPuzel, @JuneBash, and others. I'm not sure if I have a favourite, but I hope the language workgroup gives serious consideration to those suggestions.

GreatApe · January 14, 2023, 2:24pm

I for one definitely didn't know this. Perhaps a bit OT but it would be nice if we had a way to define computed variables (and even subscripts) without having to repeat the underlying expression:

var x: TypeOfX {
    getSet { foo.bar[baz].zot.x }
}

subscript(index: i) -> TypeOfX {
    getSet { foo.bar[i].zot.x }
}

As to the topic, I love the concept but find at least the inout notation a bit surprising, since it's usually not a usage site keyword. Shouldn't it be simply:

for &x in allExes { 
    x += 1
}

or

for x in &allExes { ... }

ksluder · January 14, 2023, 5:15pm

GreatApe:

As to the topic, I love the concept but find at least the inout notation a bit surprising, since it's usually not a usage site keyword. Shouldn't it be simply:
for &x in allExes { 
    x += 1
}
or
for x in &allExes { ... }

As mentioned above, this is tied to the question of what’s being borrowed: the collection, and/or each element within the collection?