Pitch: Introduce `for borrow` and `for inout` to provide non-copying collection iteration

So, with borrowed iteration and copyable types, there are actually two independent questions here: whether for loops should borrow the collection by the default, and whether they should use an iteration strategy that allows them to borrow elements. For value-semantics collections — i.e. almost all of them — the source compatibility issues turn solely on whether the collection is borrowed: even if you have an owned copy of the collection, you can still get much better performance guarantees by doing a borrowing iteration.

I feel like that idea — making iteration default to borrowing elements when iterating a Collection, but still generating an owned copy of the collection — is a pretty good compromise position that preserves source compatibility while also extending pretty naturally in two directions:

  • Clients who want to eliminate the copy of the collection can always just use the borrow operator in the collection expression.
  • It's completely reasonable to default to borrowing move-only collections. In fact, this would be the same default evaluation rule as when values are passed as borrowed parameters: move-only values are borrowed, copyable values are copied and then borrowed.
8 Likes

I think it’s worth stating explicitly that your key insight seems to be that it’s safe for the loop to do whatever it wants with a copy of the collection it creates.

1 Like

Right, the place I feel we want to end up here for Collection is calling a new generator-function requirement that yields borrowed values. We can make the default implementation of that generator use index-based iteration, but obviously in a lot of cases we can do better than that in order to eliminate validity checks during the iteration. Of course, that requires us to add generator functions; until we have that, I think it's a reasonable short-term approach to just inline an index-based iteration.

The biggest risk here with changing the default behavior is that there could be a semantics break if the collection type has an incorrect conformance to Collection/Sequence.

2 Likes

I love the proposed for inout functionality! Swift structs have a downside:

  • They make awkward or impossible many local refactorings that are easy with traditional reference-based OOP.
  • They create verbose repetition of subexpressions: if foo.bar[baz].zot.x < 0 { foo.bar[baz].zot.x = -foo.bar[baz].zot.x }
  • They lead to index-juggling that defeats the purpose of for-each loops and similar newer-than-C abstractions.

This proposal plus inout local variables stands to fix all of that. Hooray!

In addition to the discussion in the gist, I’d naively expect for var x to copy the collection values, so that assigning to x in the loop body does not affect the referenced collection. That’s analogous to how var works elsewhere.

I continue to be nervous about the terminology inout in its broadening usage, but can’t think of a better name.

7 Likes

Completely in favor of this, with a note that the syntax we use for the binding introducer (for inout foo, for borrow foo, etc) should match what is decided in borrow and inout declaration keywords.

As far as whether the default for is borrowing or not, IMO, Swift should try to borrow if it can prove that exclusivity rules are upheld, and it shouldn't otherwise, and developers can explicitly use for borrow foo if they want a diagnostic when that wouldn't be possible or if they want to take the risk that dynamic rules are still upheld.

2 Likes

To elaborate on that slightly:

inout makes sense to me in a function: the function is metaphorically a box, or maybe a room with a door, and the value both goes in and out of it. We already use that “function is a contained space” metaphor with other terms: “input / output,” “pass a value in to a function,” “return from a function” (as if it is a place one visits), etc. But in an inout local var or loop, what is the value metaphorically going in and out of? The scope, I guess? The variable? It seems metaphorically murky. It’s not a term I’d immediately intuit if I encountered it in the wild without prior knowledge, and it’s one I’d find moderately awkward to explain to students.

As I said, I can’t think of a better word that means “reference that points back to a value whose lifecycle is not tied to this particular variable declaration,” so I can live with it! But if there are any brilliant ideas out there, now is the time.

4 Likes

It might also be nice with all the related ownership features if there were a way to get a diagnostic (and a way to silence it) whenever the compiler cannot guarantee static exclusivity and has to fall back to dynamic enforcement. Perhaps borrow would always guarantee static exclusivity and borrow! can be used to allow dynamic exclusivity checks?

1 Like

I think the obvious keyword is mutating, e.g. for mutating elt in &collection, mutating foo = &x.property, etc. It would also work as a replacement for inout in parameters if the Language Workgroup ever decides that it's worth the trouble to deprecate that.

18 Likes

I share this concern about extending the use of inout to other positions, but I think it's probably safe to say that the keywords we end up using with for-in loops will mirror the keywords we choose for the general declarations so probably makes sense to litigate this aspect over in the other thread.

3 Likes

This is a bit off-topic, but one thing I have often wanted is the ability to create “identifier aliases” like so:

alias x = foo.bar[baz].zot.x
if x < 0 { x = -x }

It’s not a copy, not a borrow, not a value, not a reference, it’s literally just an alias. Another name for something. An alternative spelling.

3 Likes

The precise keywords here are obviously not final until the Language Work Group weighs in.

A lot of alternatives have been discussed as part of the thread on the borrow and inout local declarations: Pitch: `borrow` and `inout` declaration keywords (My personal favorite is some variation of "mutate" or "mutating", but I trust the LWG to make a good decision and to make sure that aligns across all the related language features.)

The "as if rule" allows the optimizer to make any such changes that provide the same visible behavior. Over time, I'm certain the optimizer folks will be looking at how to take advantage of borrowing whenever it makes things faster without breaking the behavior. This is one reason I'm happy to wait on possibly changing the behavior of plain for..in until we get a little more experience with the backing machinery.

The other thread mentioned above is proposing a way to do exactly this (just replace "alias" with "inout" in your example to match that other pitch). The various rules about it being a "borrowing reference" are really nothing more than a particular set of compiler checks and constraints which ensure that such "aliases" are actually safe. In fact, an early draft of that proposal used the word "alias" here instead of "inout", and it's still one of the possible spellings being considered.

1 Like

I'm very excited about this proposal, as I was with the related declarations proposal. I share the concerns I did there, that inout and borrow as declaration keywords aren't as clear as they could be, and I would prefer something like the following:

var collection = SomeCollection()

for element in collection {
  // immutable copy of element in collection
}
for var element in collection {
  // mutable copy of element
}
for borrow element in &collection {
  // immutable reference to collection element
}
for borrow var element in &collection {
  // mutable reference to collection element
}

Given the feedback in the other thread, I suspect this is a non-starter, but I wanted to put it out there regardless. :sweat_smile:

5 Likes

Oh, yes, of course, generator-based iteration does seem like the ideal solution.

And you make a good point that when we do that, we can have a default implementation of that generator-based protocol that mimics the index-based iteration we're proposing to introduce now. That answers a lot of my concerns about whether this short-term solution might become problematic in the long term.

3 Likes

Just chiming in to say I think this model is easier for me to read and I think easier to teach

2 Likes

This is good!

My one worry is that forcing iteration to use indices is going to add unacceptable overhead.

A side effect of the pitch is that it eliminates a crucial Collection customization point for forward iteration: the possibility to customize the Iterator type. It proposes to instead always use the (non-consuming equivalent of) IndexingIterator.

IndexingIterator tends to perform measurably/significantly worse than a custom iterator type for all but the simplest data structures. (I.e., basically anything that isn't just a contiguous array -- and even for something "simple" like Array, IndexingIterator only works as well as it does because of relatively fragile optimizations, heroically hand-tuned over the years. None of these optimizations translate to, say, collection types with non-inlinable members.)

The problem is that index validity isn't modeled in the type system -- it needs to be verified at runtime. Indices are effectively pointers into the internal structure of the collection value, and they are inherently fragile -- the slightest structural mutation can invalidate them. However, nothing prevents us from continuing to use an invalidated index, or even to apply a valid index from one collection to some completely unrelated other collection instance. Therefore, Collection operations that take indices must always treat them with suspicion, and carefully check validity every single time -- otherwise they risk hazards such as out of bounds accesses.

This means that a for-in loop that relies on indexing will do completely unnecessary index validation work, repeated multiple times during each iteration -- subscript[i] and formIndex(after: &i) will both trigger such checks, even if (as I hope) the language will reject attempts to independently mutate the container during these new-style for borrow and for inout loops.

In contrast, the current (consuming) iterators can safely assume that the underlying container will not be mutated while they are active, so they can shortcut all of these checks.

Beyond eliminating validation steps, we also have the freedom to design iterators that include additional information that optimizes the task at hand but that may not be appropriate to include in a general-purpose index. For instance, iterators in tree-based collections sometimes include direct pointers to every node on the current path, while indices often only carry a single pointer to the "current" node (if that -- node pointers are rather tricky to reliably validate). Or, in the case of something like Set, the iterator can carry state that is cheaper to update than recreating it from scratch in each iteration.

To give a specific example:

let n = 1_000_000
var set = Set(0 ..< n)

let clock = ContinuousClock()
let d = clock.measure {
    var c = 0
#if USE_CUSTOM_ITERATOR
    var it = set.makeIterator()
    while let v = it.next() {
        c &+= v
    }
#else
    var i = set.startIndex
    while i < set.endIndex {
        c &+= set[i]
        set.formIndex(after: &i)
    }
#endif
    precondition(c == n &* (n - 1) / 2)
}
print(d)
$ swiftc -O foo.swift && ./foo
0.003566333 seconds
$ swiftc -DUSE_CUSTOM_ITERATOR -O foo.swift && ./foo
0.001064959 seconds

Even for something simple like Set, the custom iterator is more than three times faster than equivalent code that's using indices. (This gap feels suspiciously large, to be honest, but even if some of it is due to us not spending enough effort on optimizing performance, it still demonstrates how much easier it is to write a fast iterator than to optimize indices.)

I would therefore be sad if we lost this customization hook. This would be all the more painful if we were planning to switch the default for in syntax to mean a for borrow even for copyable collections/elements -- we'd be inducing a performance regression.

(I expect for borrow would make up a large part of this gap if the Set's Element was a nontrivial type -- not having to retain strong references will be a significant win. But I don't expect for borrow will do much for trivial types.)

As I understand it, the new for borrow/for inout constructs would inherently prevent independent mutations of the underlying container -- this time as a proper, language-level restriction. Can we somehow use this guarantee to introduce a new, non-consuming iterator construct?

11 Likes

In the fullness of time, we should be able to—either using generator coroutines like John mentioned above, and/or augmenting Sequence with new requirements for a lifetime-bounded iterator type that can statically rely on the original sequence being borrowed for the duration of the iterator's lifetime. Any new requirement like that would need to have a default implementation written in terms of existing requirements, though, and it seems to me that we'd need to use indexes to get the right semantics for those default implementations in terms of existing collections today. So I think there's a path forward to a better borrowing iteration model in the future even if we accept this proposal today.

7 Likes

A strong +1 from me on the feature, although I think I might agree both with @Paul_Cantrell’s concerns WRT the inout spelling and with @John_McCall that mutating is the obvious alternative.

4 Likes

That's fair.

Introducing the customization hook in a future update would run into availability headaches: for-in loops would need to look at the deployment target and unconditionally revert to using indexing if it's too low. This could be surprising, but it's still much better than keeping things as is.

To try to bridge the gap, perhaps we could also consider adding new unsafe-unchecked Collection requirements to increment & dereference an index, where the caller is fully responsible for its validity. (Then again, that could lead to folks unwisely deciding to use those by default, and it would still preclude using a custom state for iteration.)

I think I like inout for for inout loops, even though the metaphor is more "out-in", it takes a value out of the collection and them puts it back in, but "out-in" looks wrong and inout is already a keyword which is close enough.

2 Likes

+1, this seems like a great change! It's a bit unclear how this change will interact with result builders though.