Solving the mutating slice CoW problem

dabrahams · April 11, 2020, 11:56pm

Summary: [SR-12524] Straightforward mutating divide-and-conquer algorithms cause COW · Issue #54967 · apple/swift · GitHub describes a problem we've had since the earliest days of Swift. For years there has been talk of solving it with language features, and in fact we got one of the necessary features in coroutine accessors. I think I can now demonstrate it's the only feature we needed, and I'll propose we incorporate this capability into the standard library.

Slicing in the Standard Library Design

Consider this implementation of a simple QuickSort:

extension BidirectionalCollection
    where Self : MutableCollection, Element : Comparable
{
  /// Sorts the elements of `self` from least to greatest.
  mutating func quickSort() {
    if isEmpty { return }
    let lastIndex = indices.last!
    let pivot = self[lastIndex]
    let p = self[..<lastIndex].partition { $0 >= pivot }  // <==
    swapAt(lastIndex, p)
    self[..<p].quickSort()                                // <==
    self[index(after: p)...].quickSort()                  // <==
  }
}

In the three marked lines, a mutating operation is invoked on a contiguous subrange of self. This kind of algorithm composition is extremely common, and important to support well.

To achieve composability, most algorithm APIs take arguments indicating the subrange to be operated on. While this approach is workable, it adds a lot of API complexity, especially to algorithms that operate on multiple collections (example). The Swift standard library was designed to avoid that pattern by:

Making slicing a first-class operation
Making algorithms generic so they could operate on slices
Supporting in-place slice mutation

So in swift, instead of threading extra arguments through every single algorithm API, we can just operate on slices. It's a beautiful story, right?

The Problem

Unfortunately, there's a fly in the ointment: slicing a collection, even for mutation, means there is a reference for the collection being sliced and a reference for the slice itself, which prevents the storage from being mutated in place. Instead, the existence of two references causes new storage allocation and a copy, the instant the slice is mutated. To keep mutating algorithms efficient, then, we are forced to pass an explicit subrange to each one.

Basis of Solution

When an ephemeral slice is created for mutation, its storage is being “borrowed” from the source of the slice. In that condition, it's actually safe to mutate the elements in the slice. If it weren't for the fact that there are other elements, that need to be preserved, in the original collection we could simply release the slice's reference early. Instead, we can remember that the second reference to the storage is due to a borrow, and allow single-element subscripting of the slice to proceed even when there are (exactly) two references. And when slicing the slice in that borrowed condition, we can, effectively, release the new slice's reference early to maintain the two-reference condition. This approach opens the door to just the kind of mutation we want to allow, without disturbing the logic that makes CoW correct everywhere else.

Implementation

A complete implementation of this idea for ContiguousArray is here. I encourage you to play with it and see what you think. I believe this is sufficient to prove the viability of the concept for the standard library, modulo ABI issues.

Considerations and Future Directions

“Remembering” that the slice is borrowed requires setting a bit in the buffer, which can be checked later. It's possible to imagine future language features that might make it possible to eliminate (some of?) this dynamism and do more things statically. I certainly hope for a better answer someday, but for now I'll take this.
I haven't analyzed the ABI stability implications; I figured “library insiders” might be in a better position to do that than I am. The buffer is already using the bit I chose, for something else, but it could become available if Foundation was using Swift's data structures instead of the other way 'round in bridged scenarios.
This technique takes a bit too much hackery today to be broadly implementable by users, and AFAIK it can't be implemented just once by the library. I would love to find a more general solution.
Parallel mutation of pairs of non-overlapping slices should be supportable by the same basic technique; just keep the reference count at 2 when slicing. I didn't implement it because I am not sure I can prevent temporary reference count increases, which would foil the in-place optimization, but I think this problem can be solved if the core language and library cooperate.

jrose · April 12, 2020, 12:58am

I'm glad we're getting closer to solving this problem—you convinced me how serious it is a long time ago!

I went on a little bit of a side trip trying to figure out if ArraySlice could just not hold a reference to the proper buffer class, and just use the pointers and offsets. But that doesn't work because if you end up with two ArraySlices live at the same time, you need to know that you've no longer got a uniquely-referenced buffer.

My naive understanding, though, is that your solution would have the same problem in reverse. A divide-and-conquer algorithm is going to take a slice of a slice, and that's still going to reference the backing storage, which means the reference count will be up to three (and then more) in the inner invocations. Did I miss something?

The low-level alternative I can think of is kind of wacky, but I can't see any problems with it at the moment: during _modify, if the buffer is uniquely referenced, temporarily assign an empty array to self, and restore it afterwards. This applies likewise in the range-based subscript _modify of ArraySlice, though someone would have to think through what to do with the other fields of the slice in the mean time.

This solution likewise doesn't scale to parallel mutation, unfortunately. I can't see an answer to parallel mutation at all that uses the existing ArraySlice type, but since parallel mutation doesn't exist yet maybe we can come up with a new type that sidesteps some of these problems.

dabrahams · April 12, 2020, 2:02am

This:

Or just look at the code

jrose · April 12, 2020, 2:06am

Ooops, I did indeed miss this, thanks for pointing me at the right part. Any reason we can't do that at the top level, and avoid the need for isDuallyReferenced? [EDIT: though we still need your borrowing flag, to avoid overwriting the rest of the buffer.] Or do my slightly-less-horrific thing if it does work?

(On the other hand, I congratulate you on inventing antimatter references. )

dabrahams · April 12, 2020, 2:09am

Yes, maintaining only a single reference doesn't keep this property:

If the slice is grown by the mutation, for example, it will release its buffer, and with only a single reference that's going to deallocate it and anything it was sliced out of.

jrose · April 12, 2020, 2:10am

>< I'm not thinking things through at all well today. Thanks, Dave.

dabrahams · April 12, 2020, 2:16am

No problem; it can be hard to think about! Fortunately, the dually-referenced thing actually does make reasoning easier, once you accept it, because you know the top level of the stack of slices accounts for one reference.

We could ask whether ARC could get smart enough to cause trouble for this scheme, but, essentially, it didn't get too smart for the existing CoW semantics because we decided we were supporting those semantics. I think we could, equally well, make a commitment to keep ARC consistent with this scheme.

Michael_Gottesman · April 12, 2020, 5:04am

When an ephemeral slice is created for mutation, its storage is being “borrowed” from the source of the slice. In that condition, it's actually safe to mutate the elements in the slice. If it weren't for the fact that there are other elements, that need to be preserved, in the original collection we could simply release the slice's reference early. Instead, we can remember that the second reference to the storage is due to a borrow, and allow single-element subscripting of the slice to proceed even when there are (exactly) two references. And when slicing the slice in that borrowed condition, we can, effectively, release the new slice's reference early to maintain the two-reference condition. This approach opens the door to just the kind of mutation we want to allow, without disturbing the logic that makes CoW correct everywhere else.

Dave, can you draw this out in pseudo-code/provide an explicit example. I think that would be helpful.

dabrahams · April 12, 2020, 5:10am

I think the actual changes explain the logic very well: I put a lot of effort into making sure that they were clear and easy to understand. Did you look?

Michael_Gottesman · April 12, 2020, 6:06am

I haven't ingested all of the semantics but please be aware that you are using APIs on unmanaged that insert retains, releases that can not be eliminated by the optimizer and are viewed sort of like an arbitrary unknown function call.

dabrahams · April 12, 2020, 6:24am

Yes, I am aware. If you look back in the revision history there’s a version that doesn’t use unmanaged; it’s just ugly and harder to understand.

dabrahams · April 12, 2020, 7:17pm

For what it's worth, I've made some slides of what happens on the “happy path” of this scheme. I don't want to pre-emptively make a whole lot more of these slides, but as people ask questions I can update them as necessary (including covering “unhappy paths”).

Andrew_Trick · April 14, 2020, 12:07am

We could ask whether ARC could get smart enough to cause trouble for this scheme, but, essentially, it didn't
get too smart for the existing CoW semantics because we decided we were supporting those semantics

The semantics required for CoW are formalized in the language:

A local object is released after its last use
An 'owned' AnyObject argument release one reference
An 'inout' AnyObject argument may release one reference

I think that's it. isUniquelyReferenced only works because the compiler thinks it can release its argument.

There's no formalism that allows the implementation to track the number of variables that may have referenced an object. The compiler will break any such assumptions in entirely unpredictable ways. It's not like we have a set of discrete optimizations, X, Y, and Z, and if we just turn off Z we'll be safe.

The only sure-fire way to ensure an object has two references is to pass two distinct inout addresses, each of which is known to hold the same object reference. In this case, you would need both the ArrayBuffer and the SliceBuffer's address at the same time.

So, some amount of language support is needed. If not a language feature, then at least a builtin, possibly specific to the Array implementation. Without it, this is what will eventually happen (if you don't mind my pseudo-code):

var array: inout

wasUnique = isUniquelyReference(&array) // true

var rhs = ArraySlice(array._buffer)     // rhs refcount never mutated
                                        // retain/release elided

if wasUnique {                          // true
  array._storage.isBorrowed = true
}

yield &rhs
  
  var sliceCopy = rhs
  sliceCopy[x] = y                     // Thinks it's borrowing
                                       // since refcount==2

// end yield  

if wasUnique { array._storage.isBorrowed = false }

Another broken scenerio would be a read-only sliceCopy that escapes.

We could force ArraySlice to always retain its storage. This might have noticable performance implications--we could devise an experiment find out.

Assuming the performance is acceptable, I don't know how to actually implement that with existing APIs--we can't use Unmanaged for the ArraySlice without struct-destructors. It would be sufficient to add an opaque builtin and SIL instruction that simply takes AnyObject and produces a new one without the compiler knowing that they are in fact the same reference.

The complexity of doing that probably needs to be weighed against some sort of copy-on-escape support, which would have both the performance and semantic properties that we want.

dabrahams · April 14, 2020, 2:02am

Sorry, but that can't possibly be it, because it never mentions retains! For the CoW optimization (not copying uniquely-referenced buffers) to work, something needs to guarantee that references are retained, or it would break things like this:

let a = [1, 2, 3]
var b = a
b[0] = 99 // if there isn't a retain for b, this writes on a
assert(a != b)

isUniquelyReferenced only works because the compiler thinks it can release its argument.

I think you mean, “Using isUniquelyReferenced to avoid a copy only works because the compiler…”. A crucial element of isUniquelyReferenced “working” is that it reports no false positives, and that has to do with retains.

But that aside, I'm afraid I don't understand what you're getting with the rest of that sentence (“thinks it can release its argument”). Partly that's due to ambiguous writing: do “it” and “its” refer to the same thing (isUniquelyReferenced), or does “it” refer to the compiler?

Furthermore, isUniquelyReferenced can be implemented using ObjectIdentifier, which makes the issue of whether it can release its argument moot. If what you mean is that the check only succeeds because the compiler doesn't insert needless retains on an inout argument, then I agree with that (and it's missing from your list of formalisms)—but that doesn't sound like what you're saying.

If you really did cite all the relevant formalisms, the formalisms are insufficient to guarantee that CoW works today, in at least the two ways that I've pointed out (and will repeat in bullets below).

Regardless, nothing I've done or written is predicated on tracking the number of variables that may have referenced an object, beyond assumptions that already have to be true for CoW to be working today, namely:

An object's reference count will never be 1 unless there's only one variable (whose lifetime could not have ended already) referencing it. That guarantees we preserve value semantics while still doing the Cow optimization.
When an existing lvalue is passed inout, no needless retains are added. That makes it possible to detect the unique reference and do the CoW optimization.

Whatever formalisms may be written down, the two above must be what are effectively implemented, or value semantics and the CoW optimization are already broken.

Again, that statement simply can't be true, at least not without some qualification, or existing code is broken, as I illustrated in my first example.

In this case, you would need both the ArrayBuffer and the SliceBuffer's address at the same time

Well, I don't know about “addresses”—this is high-level Swift after all —but we do have both the ArrayBuffer and the SliceBuffer passed inout at the same time at all points where it's important to detect that there are two references.

So, some amount of language support is needed. If not a language feature, then at least a builtin, possibly specific to the Array implementation. Without it, this is what will eventually happen (if you don't mind my pseudo-code):
var array: inout

wasUnique = isUniquelyReference(&array) // true

var rhs = ArraySlice(array._buffer)     // rhs refcount never mutated

The only way you can get in that situation is if the compiler is able to prove that the lifetime of either the array or the slice could have been ended (rule 1 above). I'm pretty certain it's actually impossible for the compiler to prove that if the slice is mutated or copied, because their values will both be observed.

Another broken scenerio would be a read-only sliceCopy that escapes.

I don't think that breaks either, for the same reasons. I worked through both of those scenarios before posting here, and you can find attempts to stress those situations in my test code (which I realize doesn't prove anything).

We could force ArraySlice to always retain its storage. This might have noticable performance implications--we could devise an experiment find out.

Hah, the reason this whole thing is needed in the first place is that ArraySlice does always retain its storage for mutation, so I'm pretty sure you will observe no change from that experiment!

Not at all sure what you're getting at here. As I mentioned to Michael, you don't need Unmanaged to make this scheme work sufficiently for the non-parallel-mutation case. An ugly demonstration is here, but it uses only safe constructs.

That's all pretty vague so far; if you know a better way to get the right results, and could lay out the details, and someone's going to implement it, I'm all for it. Otherwise, we've waited 5 years for this to be fixed and here's a way that I, at least, believe could be replaced when and if something better comes along.

Andrew_Trick · April 14, 2020, 9:20am

Every time I mention a release, that obviously implies a matching retain.

I think you mean, “Using isUniquelyReferenced to avoid a copy only works because the compiler…”. A crucial element of isUniquelyReferenced “working” is that it reports no false positives, and that has to do with retains.

That's the opposite of what I meant. The only salient properties of isUniquelyReferenced that prevent the compiler from removing necessary copies (preventing false positives) are that isUniquelyReferenced may release the reference addressed by isUniquelyReferenced's inout argument, and isUniquelyReferenced can never be fully inlined (the builtin also takes an inout reference).

A crucial element of isUniquelyReferenced “working” is that it reports no false positives, and that has to do with retains.

Every time I mention a release, that obviously implies a retain.

The only semantic property an individual retain has today is that one must exist for each potential release, i.e. to provide an owned reference. As soon as we can determine a reference cannot be released within its lifetime, then the retain has no purpose. I think you want semantics on retains that have something to do with local variable scope.

The only way I can think of attaching additional semantics to a retain is by introducing a "forced copy" feature. In my previous response, I explained that the compiler could provide such a feature without much difficultly, but that using the feature in ArraySlice.init might have a performance penalty.

But that aside, I'm afraid I don't understand what you're getting with the rest of that sentence (“thinks it can release its argument”). Partly that's due to ambiguous writing: do “it” and “its” refer to the same thing ( isUniquelyReferenced ), or does “it” refer to the compiler?

isUniquelyReferenced may release the reference addressed by isUniquelyReferenced's inout argument, as far as the compiler is concerned.

Furthermore, isUniquelyReferenced can be implemented using ObjectIdentifier , which makes the issue of whether it can release its argument moot.

The burden of proof is on us when introducing an API to explain why it is correct in terms the compiler's model of language semantics. If we can't write down enforceable rules for SIL that the implementation must follow, then none of us can predict how some combination of implementation choices will violate our expectations. Nonetheless, your ObjectIdentifier example is interesting and educational:

There are two answers to why the ObjectIdentifier example works in practice, depending how much of the code the compiler can see at one time (depending on inlining).

Builtin.isUnique forces an in-memory copy of a reference by virtue of taking its address inout.

The same is true for any wrapper around Builtin.isUnique that takes the reference's address inout.

In this example, when everything is inlined, the fact that you're casting to and from an opaque pointer simply does not matter. The compiler sees through it.

Now, when Builtin.isUnique forces the lvalues x1, and x2 into memory locations, it effectively forces a copy. Relating that to your situation, if both the Array and ArraySlice are mutated, then your "Solving the mutating slice CoW implementation" does work fine in practice and in theory.

In the two incorrect scenarios that I pointed out, sliceCopy is not mutated, and is not forced into a distinct memory location.

Nested retains are not currently eliminated, even when they are blantantly redundant.

If we carefully control inlining so that everything except the non-mutating liveObjectIsUniquelyReferenced is inlined, then the ObjectIdentifier example becomes interesting.

In this case, x2 is no longer passed @inout to any opaque code. There's no reason for the implementation to generate a copy.

So why does the example still work today even with worst-case inlining? (I actually tried this.) The SIL optimizer is extraordinarily bad at removing redundant nested retains to the same object. All it takes is one escape or unknown use of the reference to prevent copy elimination. This inability to optimize seemingly obvious redundancy is a problem with the representation that will disappear in due time. So, when _liveObjectIsUniquelyReferenced is not inlined, there just isn't any fundamental reason that your ObjectIdentifier example works. And more importantly, I can't think of any SIL-level requirement that would preserve the current behavior short of "the compiler never removes nested copies of the same reference".

There's no formalism that allows the implementation to track the number of variables that may have referenced an object.

If you really did cite all the relevant formalisms, the formalisms are insufficient to guarantee that CoW works today, in at least the two ways that I've pointed out (and will repeat in bullets below).

Regardless, nothing I've done or written is predicated on tracking the number of variables that may have referenced an object, beyond assumptions that already have to be true for CoW to be working today , namely:

An object's reference count will never be 1 unless there's only one variable (whose lifetime could not have ended already) referencing it. That guarantees we preserve value semantics while still doing the Cow optimization.

SIL has no way to represent that formalism and doesn't need to because reference counts aren't directly observable. It is only the act of observing a variable's uniqueness, via Builtin.isUnique, that forces the variable's reference to be distinct from any other instances of the same reference. This makes it appear to the programmer "as-if" each variable has its own reference, but only for the purpose of uniqueness checking. This model does not provide any way to count higher than one. As I explained, the only way to observe a reference count of 2 would be to (a) observe two independent variables at the same time (b) create a new language feature that forces variables to have distinct reference counts even when they aren't being observed.

Also, even if this uniqueness property were modeled the way you seem to think it is, I don't know how you get from "An object's reference count will never be 1..." to "An object's reference count will never be 2...".

The only sure-fire way to ensure an object has two references is to pass two distinct inout addresses, each of which is known to hold the same object reference.

Again, that statement simply can't be true, at least not without some qualification, or existing code is broken, as I illustrated in my first example.

Let's make sure we don't confuse the statement "a reference count is greater than one" with "a reference count is exactly 2". That said, in this example, the array clearly has reference count==2 before it is copied:

let a = [1, 2, 3]
var b = a
b[0] = 99 // if there isn't a retain for b, this writes on a
assert(a != b)

And in this example, the array might have refcount==2 or 3 when it is copied. It's not determined by language semantics:

let a = [1, 2, 3]
let b = a
var c = b
c[0] = 99
assert(a != c)
_ = b

In this case, you would need both the ArrayBuffer and the SliceBuffer's address at the same time

Well, I don't know about “addresses”—this is high-level Swift after all —but we do have both the ArrayBuffer and the SliceBuffer passed inout at the same time at all points where it's important to detect that there are two references.

This is roughly how isDuallyReferenced would need to be written today, without any additional support (e.g. without some way to force copies).

Builtin.isDuallyReferenced(_ object1: inout AnyObject, _ object2: inout AnyObject) -> Bool {
  do {
    _precondition(object1 == object2)
  }
  return swift_isDuallyReference(Builtin.addressof(&object1),
                                 Builtin.addressod(&object2))
}

I just don't see how it would work for you. The address of the Array struct, which holds reference #1, is not available during ArraySlice._modify.

So, some amount of language support is needed. If not a language feature, then at least a builtin, possibly specific to the Array implementation. Without it, this is what will eventually happen (if you don't mind my pseudo-code):
var array: inout

wasUnique = isUniquelyReference(&array) // true

var rhs = ArraySlice(array._buffer)     // rhs refcount never mutated
The only way you can get in that situation is if the compiler is able to prove that the lifetime of either the array or the slice could have been ended (rule 1 above). I'm pretty certain it's actually impossible for the compiler to prove that if the slice is mutated or copied, because their values will both be observed.

Between the creation of the rhs slice and its last use, there is no way that array's storage can be released. Consequently, the compiler could remove the retain of the non-mutating ArraySlice rhs. I can elaborate on this more if we need to.

Another broken scenerio would be a read-only sliceCopy that escapes.

I don't think that breaks either, for the same reasons. I worked through both of those scenarios before posting here, and you can find attempts to stress those situations in my test code (which I realize doesn't prove anything).

It would be interesting to know why those two scenarios are working in your tests. I would need to put more time into investigating them.

We could force ArraySlice to always retain its storage. This might have noticable performance implications--we could devise an experiment find out.

Hah, the reason this whole thing is needed in the first place is that ArraySlice does always retain its storage for mutation, so I'm pretty sure you will observe no change from that experiment!

ArraySlice does retain its storage when it is mutated. If the compiler does its job, then ArraySlice does not retain its storage when

the ArraySlice is not checked for uniqueness (not mutated)
the ArraySlice lifetime is limited to the Array's lifetime
the Array is not checked for uniqueness within the ArraySlice's lifetime

(A fake uniqueness check on either the ArraySlice or on the array within the ArraySlice's lifetime would be a way to force the copy without any special Builtin, I was just afraid that the compiler would eliminate uniqueness check itself if it's not used)

Assuming the performance is acceptable, I don't know how to actually implement that with existing APIs--we can't use Unmanaged for the ArraySlice without struct-destructors.
[/quote]

Not at all sure what you're getting at here. As I mentioned to Michael, you don't need Unmanaged to make this scheme work sufficiently for the non-parallel-mutation case. An ugly demonstration is here, but it uses only safe constructs.

I'm ruling out the possiblity of implementing manual reference counting on ArraySlice in case that isn't already obvious.

It would be sufficient to add an opaque builtin and SIL instruction that simply takes AnyObject and produces a new one without the compiler knowing that they are in fact the same reference.

The complexity of doing that probably needs to be weighed against some sort of copy-on-escape support, which would have both the performance and semantic properties that we want.

That's all pretty vague so far; if you know a better way to get the right results, and could lay out the details, and someone's going to implement it, I'm all for it. Otherwise, we've waited 5 years for this to be fixed and here's a way that I, at least, believe could be replaced when and if something better comes along.

I haven't tried to design copy-on-escape, so I just don't know how difficult it will be.

Again, we can take a copy-always approach instead of a copy-on-escape approach, but it could hurt performance.

[EDIT] It occurs to me why there may be some misunderstanding. When I say "forcing a copy", I mean copying the array storage reference, not copying the contents of the array. At the SIL level, this translates into forcing a retain.

So, in ContiguousArray.subscript(bounds:), we need some way to ensure that this variable initialization always results in a retain. Conceptually:

var rhs = ArraySlice(_buffer: _buffer[bounds])
blackhole(&rhs._buffer._owner)

When I say "there may be a performance penalty" I mean that there will be extra retain/release calls when slicing an array, and some optimizations may be inhibited.

sveinhal · April 14, 2020, 2:31pm

Although my understanding is limited — maybe even because of it — I enjoy this conversation! I'm learning a lot from it :-)

jrose · April 14, 2020, 5:36pm

At this point I'm also a spectator, so here's my attempt at summarizing (partly because I feel like Andy and Dave are talking past each other a little bit):

Dave has implemented a borrowing system that works with the current runtime implementation of ARC. Andy is concerned that it's not compatible with the current compile-time implementation of ARC. Those two are related, but not the same; in particular, isUnique has to do both of the things they're talking about:

At compile time, it has to appear as though it can replace the reference it takes inout so that the compiler doesn't make the lifetime of other references depend on this one.
At run time, it has to identify whether there are other references that don't depend on this one.

(The other thing we're talking about that may not be clear to observers is that "making CoW work" doesn't usually mean "copying a buffer only when it's mutated" when it's standard library and Swift compiler devs talking about it; it's "avoid copying a buffer to mutate it if nobody else is depending on the contents of the buffer".)

The problem with isDuallyReferenced is that it could indicate one of the good situations:

one reference for the single variable holding an Array
one reference for the slice currently being modified

or:

one reference for the single variable holding an Array
~~one reference for the slice currently being modified~~ (removed by explicit code, whether an explicit release/retain or a temporary reassignment)
one reference for the sub-slice currently being modified

but it could also be:

one reference for the variable holding an Array
~~one reference for the slice currently being modified~~ (removed by compiler because, say, withUnsafeMutableBuffer is called on the slice and now all the refcount operations are inlinable)
one reference for another variable also holding that Array

or:

one reference for the single variable holding an Array
one reference for the slice currently being modified
~~one reference for a copy of the first slice that is still live~~ (removed by compiler because it can see that the second slice doesn't outlive the first, or for that matter the Array)

This last scenario is especially bad because copying a value in Swift doesn't provide the opportunity to run any extra code.

Also, it wasn't immediately obvious to me that these two bits go together:

(Perhaps the form of that "extra retain" in SIL will be something like copy_value_opaque, where the optimizer can't assume the input and output values have anything to do with each other.)

I was hoping by the time I'd written all this up I'd have also thought of a good way to move forward, but I think this is where we're at:

We can't guarantee unique referencing because we have to hold on to the original buffer contents. (I made that mistake already.)
We can't guarantee that multiple references will not result in something that looks like "dual referencing" / unique-in-practice-because-all-owners-are-accounted-for without giving up on certain optimizations forever. ("forever" because ArraySlice is heavily inlinable.) It'd still be better than where we are today, though.

Does that sound right?

Andrew_Trick · April 14, 2020, 7:35pm

Yes, that's what I have in mind, without giving it too much thought. That would be sufficient to guarantee a retain. There may be more clever ways to accomplish it, but I'm afraid they would also be more fragile.

I was hoping by the time I'd written all this up I'd have also thought of a good way to move forward, but I think this is where we're at:

We can't guarantee unique referencing because we have to hold on to the original buffer contents. (I made that mistake already.)

We can't guarantee that multiple references will not result in something that looks like "dual referencing" / unique-in-practice-because-all-owners-are-accounted-for without giving up on certain optimizations forever. ("forever" because ArraySlice is heavily inlinable.) It'd still be better than where we are today, though.

Does that sound right?

Overall, your post is an excellent summary, and your conclusions match my current understanding.

dabrahams · April 14, 2020, 11:00pm

[Edit: I forgot to say,] Andy, thank you for your patient engagement.

That's not obvious to me, especially in the case of “ An 'inout' AnyObject argument may release one reference.” That would seem to imply that an inout argument would be retained one more time than it actually is.

The only salient properties of isUniquelyReferenced that prevent the compiler from removing necessary copies (preventing false positives) are that isUniquelyReferenced may release the reference addressed by isUniquelyReferenced's inout argument, and isUniquelyReferenced can never be fully inlined (the builtin also takes an inout reference).

Okay. That's a really weak foundation on which to build the system whose semantics we have today, but I can accept that it's coded that way.

The only semantic property an individual retain has today is that one must exist for each potential release, i.e. to provide an owned reference.

When you say ”An 'inout' AnyObject argument may release one reference” is that not a “potential release?” This language even more strongly implies one more retain for inout arguments than we actually observe.

As soon as we can determine a reference cannot be released within its lifetime, then the retain has no purpose. I think you want semantics on retains that have something to do with local variable scope.

I want the semantics that I claimed must already be implemented.

An object's reference count will never be 1 unless there's only one variable (whose lifetime could not have ended already) referencing it.
When an existing lvalue is passed inout , no needless retains are added.

This has nothing to do with localness or scope (which is about name lookup, right?) of variables, but does have to do with general variable lifetime. Also, we can add that the compiler can do whatever it wants with reference counts as long as they follow these rules when observed by blessed/builtin operations like isUniquelyReferenced; that doesn't change my story at all.

The only way I can think of attaching additional semantics to a retain is by introducing a "forced copy" feature. In my previous response, I explained that the compiler could provide such a feature without much difficultly, but that using the feature in ArraySlice.init might have a performance penalty.

I agree that such forcing sounds on the face of it like it would be costly, especially if you mean to do it unconditionally.

The burden of proof is on us when introducing an API to explain why it is correct in terms the compiler's model of language semantics.

Sure. I hope you don't imagine that I'm beyond suggesting that the compiler's model of language semantics might be wrong, though

If we can't write down enforceable rules for SIL that the implementation must follow, then none of us can predict how some combination of implementation choices will violate our expectations.

Absolutely agreed! Let's write down rules that support both value semantics and optimization! That means representing things at a level closer to the user model, like “semantic ARC” has been promising to do. I think the retains you can't eliminate are those on mutable copies of a reference (except where the compiler can prove that those copies aren't accessed, of course). [EDIT] That is a little conservative, but it's approaching rightness. I need to think about this some more.

Nonetheless, your ObjectIdentifier example is interesting and educational:

Cool, let's discuss!

There are two answers to why the ObjectIdentifier example works in practice, depending how much of the code the compiler can see at one time (depending on inlining).

Builtin.isUnique forces an in-memory copy of a reference by virtue of taking its address inout .

By “forces an in-memory copy” I presume you mean it is forced to exist, rather than occur. That is, if the reference was already in memory, I presume no copy is needed.

The same is true for any wrapper around Builtin.isUnique that takes the reference's address inout .

Sure. Didn't you just say that Builtin.isUnique is opaque to the compiler? It would seem that an opaque thing being passed the reference's address at all (e.g. not inout) should be enough to force the reference into memory.

In this example, when everything is inlined, the fact that you're casting to and from an opaque pointer simply does not matter. The compiler sees through it.

Sorry, I don't see any OpaquePointer in this example. Do you just mean, “a pointer?” When you say “casting” are you referring to the use of withMemoryRebound? FWIW, when I wrote that code I was not trying to hide anything from the compiler. The only tricky thing was to avoid passing references around by value, which could interfere with the observation of uniqueness, at least in unoptimized builds. Note also this code was written before the calling convention changed from owned to guaranteed.

Now, when Builtin.isUnique forces the lvalues x1 , and x2 into memory locations, it effectively forces a copy. Relating that to your situation, if both the Array and ArraySlice are mutated, then your "Solving the mutating slice CoW implementation" does work fine in practice and in theory.

In the two incorrect scenarios that I pointed out, sliceCopy is not mutated, and is not forced into a distinct memory location.

It's not exactly clear what you mean by “are mutated.” Mutating the slice mutates the logical value of the Array, and we're already in the context of a mutating method on the array, so in both those senses, they both are mutated. But perhaps you really mean “if the shallow bits of both the Array and ArraySlice are mutated…?“ That appears to be the implication of the last sentence, because your first example definitely mutates sliceCopy, semantically.

Nested retains are not currently eliminated, even when they are blantantly redundant.

Huh, is that sometimes, or always? I was under the impression that at least some nested retain/release pairs were eliminated by optimization.

If we carefully control inlining so that everything except the non-mutating liveObjectIsUniquelyReferenced is inlined, then the ObjectIdentifier example becomes interesting.

In this case, x2 is no longer passed @inout to any opaque code. There's no reason for the implementation to generate a copy.

So why does the example still work today even with worst-case inlining? (I actually tried this.) The SIL optimizer is extraordinarily bad at removing redundant nested retains to the same object. All it takes is one escape or unknown use of the reference to prevent copy elimination. This inability to optimize seemingly obvious redundancy is a problem with the representation that will disappear in due time. So, when _liveObjectIsUniquelyReferenced is not inlined, there just isn't any fundamental reason that your ObjectIdentifier example works. And more importantly, I can't think of any SIL-level requirement that would preserve the current behavior short of "the compiler never removes nested copies of the same reference".

Ewww. I think you've convinced me that might be what would be required within the model of the compiler as you've described it. I still think a better model is possible.

An object's reference count will never be 1 unless there's only one variable (whose lifetime could not have ended already) referencing it. That guarantees we preserve value semantics while still doing the Cow optimization.

SIL has no way to represent that formalism and doesn't need to because reference counts aren't directly observable. It is only the act of observing a variable's uniqueness, via Builtin.isUnique, that forces the variable's reference to be distinct from any other instances of the same reference. This makes it appear to the programmer "as-if" each variable has its own reference, but only for the purpose of uniqueness checking.

That's all that matters, for this rule's purpose. I don't care whether the reference count is 1 when I'm not observing it.

This model does not provide any way to count higher than one.

Sorry, I just don't see how that can be true. If it can't count higher than one, isUnique could always return true. If what you mean to say is that it provides no way to distinguish meaningfully between a count of 2 and a count of 3, then I can accept that. And, since my scheme treats 2 specially, I could see how that could be a problem. My thinking was that it was conservative, so the optimization might fail to kick in but soundness could not be violated… but I could be wrong about that.

As I explained, the only way to observe a reference count of 2 would be to (a) observe two independent variables at the same time

Would you kindly define “independent” and ”at the same time”?

(b) create a new language feature that forces variables to have distinct reference counts even when they aren't being observed.

Is that (a) and (b) or (a) or (b)?

Also, even if this uniqueness property were modeled the way you seem to think it is, I don't know how you get from "An object's reference count will never be 1..." to "An object's reference count will never be 2...".

Aye, there's the rub.

Let's make sure we don't confuse the statement "a reference count is greater than one" with "a reference count is exactly 2". That said, in this example, the array clearly has reference count==2 before it is copied:
let a = [1, 2, 3]
var b = a
b[0] = 99 // if there isn't a retain for b, this writes on a
assert(a != b)

s/the array/the buffer/ and we can agree. The array itself is copied in the 2nd line. And I don't know of anything you've said that prevents the buffer's reference count being >2 at that point.

And in this example, the array might have refcount==2 or 3 when it is copied. It's not determined by language semantics:
let a = [1, 2, 3] // 1
let b = a         // 2
var c = b         // 3
c[0] = 99         // 4
assert(a != c)    // 5
_ = b            // 6

I take your point (though I think you need to replace line 6 with an actual observation of b to really make it).

Yes, the reference count can be 2 on line 4, but we're not in an inout slice context so the borrowed bit wouldn't be set here and that's harmless to my scheme.

In this case, you would need both the ArrayBuffer and the SliceBuffer's address at the same time

Well, I don't know about “addresses”—this is high-level Swift after all —but we do have both the ArrayBuffer and the SliceBuffer passed inout at the same time at all points where it's important to detect that there are two references.

This is roughly how isDuallyReferenced would need to be written today, without any additional support (e.g. without some way to force copies).
Builtin.isDuallyReferenced(_ object1: inout AnyObject, _ object2: inout AnyObject) -> Bool {
  do {
    _precondition(object1 == object2)
  }
  return swift_isDuallyReference(Builtin.addressof(&object1),
                                 Builtin.addressof(&object2))
}

So, by saying the addresses need to be taken “at the same time” you mean they both need to be passed to a function that's opaque to the compiler and therefore might do two releases?

I just don't see how it would work for you. The address of the Array struct, which holds reference #1, is not available during ArraySlice._modify.

Well, this is all at the level of awful hackery at this point, but… the “borrowed” bit is what tells you that they have the same value.

The only way you can get in that situation is if the compiler is able to prove that the lifetime of either the array or the slice could have been ended (rule 1 above). I'm pretty certain it's actually impossible for the compiler to prove that if the slice is mutated or copied, because their values will both be observed.

Between the creation of the rhs slice and its last use, there is no way that array's storage can be released. Consequently, the compiler could remove the retain of the non-mutating ArraySlice rhs . I can elaborate on this more if we need to.

No, thanks. It's very clear to me why that's true under the existing model.

Hah, the reason this whole thing is needed in the first place is that ArraySlice does always retain its storage for mutation, so I'm pretty sure you will observe no change from that experiment!

ArraySlice does retain its storage when it is mutated. If the compiler does its job, then ArraySlice does not retain its storage when

the ArraySlice is not checked for uniqueness (not mutated)

the ArraySlice lifetime is limited to the Array's lifetime

the Array is not checked for uniqueness within the ArraySlice's lifetime

is that an “and” list or an “or” list?

(A fake uniqueness check on either the ArraySlice or on the array within the ArraySlice's lifetime would be a way to force the copy without any special Builtin, I was just afraid that the compiler would eliminate uniqueness check itself if it's not used)

We could always pass its result to some opaque builtin ;-)

Not at all sure what you're getting at here. As I mentioned to Michael, you don't need Unmanaged to make this scheme work sufficiently for the non-parallel-mutation case. An ugly demonstration is here, but it uses only safe constructs.

I'm ruling out the possiblity of implementing manual reference counting on ArraySlice in case that isn't already obvious.

My goodness, man; I hope so! My code simply swaps the sub-slice's bounds into self and yields that, to avoid taking another reference.

It would be sufficient to add an opaque builtin and SIL instruction that simply takes AnyObject and produces a new one without the compiler knowing that they are in fact the same reference.

I think I understand why you were suggesting that now.

The complexity of doing that probably needs to be weighed against some sort of copy-on-escape support, which would have both the performance and semantic properties that we want.

I haven't tried to design copy-on-escape, so I just don't know how difficult it will be.

If this thread turns out to be another in a long line of examples where I implement something so badly that it disgusts real maintainers into doing it right, I promise I won't be insulted. Just sayin'.

[EDIT] It occurs to me why there may be some misunderstanding. When I say "forcing a copy", I mean copying the array storage reference, not copying the contents of the array. At the SIL level, this translates into forcing a retain.

So, in ContiguousArray.subscript(bounds:) , we need some way to ensure that this variable initialization always results in a retain. Conceptually:
var rhs = ArraySlice(_buffer: _buffer[bounds])
blackhole(&rhs._buffer._owner)
When I say "there may be a performance penalty" I mean that there will be extra retain/release calls when slicing an array, and some optimizations may be inhibited.

I'd always love to do better.

Nevin · April 15, 2020, 2:53pm

I am very much a spectator, and to be honest I am surprised that recursively mutating a slice doesn’t already avoid copying the buffer.

My (now obviously incorrect) mental model goes something like this:

var a = Array(...)

// For this code, the compiler either retains the buffer,
// or statically proves that `a` is never used again:
var b = a[...]
b.sort()

// For this code, the compiler never retains the buffer.
// If the buffer is not uniquely referenced, then it makes
// a new buffer for `a` and operates directly on that:
a[...].sort()

Within the sort() call, my mental model is similar. Recursive calls to self[...].sort() also do not retain the buffer. If the buffer is uniquely referenced, they operate on it directly. Otherwise, they make a new buffer for self, which somehow is promulgated all the way up to make a new buffer for a. (Maybe the slice actually holds a reference-to-the-reference held by a?)

In this model, no matter how deep the recursion goes, none of the slice-and-immediately-mutate lines will ever bump the retain count. They always operate on the same buffer as a, either by verifying it is uniquely referenced, or if not then by making a new buffer for a first.

• • •

This thread has made it quite clear that my understanding is not accurate. My best guess (which is probably also wrong) as to why it doesn’t work like this, has to do with ensuring simultaneous overlapping slice-mutations are well-behaved.

That could be fixed up with an “Already being mutated” flag similar to Dave’s. That way while in the middle of an a[...].sort() call, recursive calls to self[...].sort() would work, but trying to mutate a directly (or through a new slice) would hit that flag. Unlike Dave’s plan though, this would work in conjunction with isUniquelyReferenced rather than isDuallyReferenced.

• • •

I’m sure there are good reasons why it wasn’t implemented like this, but that’s basically how I thought it worked.