Solving the mutating slice CoW problem

To be able to mutate the base via the slice like this, you need to handle two things: the wipeout of the slice, and the possible escape of the slice. In both cases, I think it's a question of detecting it has happened and incrementing the reference counts in the base at that point to protect the it (in the case of wipeout: not leading to releasing the base when it should continue to exist; and in the case of escape, preserving the value semantics of the base). If you do that, I don't think you need to worry about preventing reassignment to the slice (which I don't think is possible – it's pretty baked into swift that you can always do that).

Hi Dmitri!

While that's true, I don't think it captures the reasons for our problem. I think what you just described is a characteristic of anything that can be written through a property or subscript. For example, here's a String property doing that:

struct X { var s = "abcde" }
var x: X

// `String` acts as independent value
var t = x.s
t.removeLast() // doesn't affect x

// `String` acts as “mutable borrow” that passes writes through.
x.s.removeLast()  

Making it performant with CoW

When the target of the write is implemented with CoW, getting good performance depends on the property/subscript access not prematurely claiming an additional reference. When the target is already stored in memory somewhere, the compiler happens to oblige us by not claiming that reference, provided the accessor is either:

  • yielding the target's address from a _modify, or
  • is the accessor synthesized by the compiler for regular stored properties

The issue for slicing…

…is two^H^Hhreefold: [EDITED]

  1. When you slice an array, the Slice instance isn't already in memory anywhere, so it has to be constructed on the stack. At least today, that construction eagerly claims an additional reference to the buffer.

  2. If we could avoid claiming a reference for the yielded slice, when a slice is reshaped:

     a[3...7].removeLast()
    

    there is code that will deallocate elements belonging to the array if it finds the slice to be uniquely-referenced. I'm not sure how much of that code is compiled into clients already, but that could be a problem if the only test is uniqueness.

  3. If you could avoid claiming a reference for the yielded slice, you'd still need to prevent the array buffer from being deallocated in situations like this

     extension ArraySlice { 
       mutating func reassign() { self = ArraySlice() } 
     }
     a[3...7].reassign()
    

Years ago, before we ever got pinning as it was eventually formulated, @Joe_Groff and I designed a similar system that solved the problems of this thread, but that design wasn't adopted. I suggest to you that while it may have been a relief to you to see the bit gone, the bit (or something equivalent) was always needed to produce the semantics we want.

If you want to solve the problem without an extra bit you need to deal with all three parts of the issue for slicing. Solving the first part (the extra reference) would mean a way for the slicing subscript to initialize and destroy that inout Base without retaining it. However, I don't know how anyone can solve the other two parts of the problem, especially the third part, if there's only a single reference.

But I didn't think that through and maybe I'm missing something.

I mean, you're not missing anything in saying “it would be awesome.” The problem is making that feature do what's needed. As far as I can tell, I've proven that you can't solve the problem using only uniqueness tests and no additional bits.

Yes, the idea is this is a new representation that would take a "borrowed" copy of the Base. That borrow would mean a +0 copy (solving your issue #1), and would need to be scoped to reserve exclusive access of the Base. It would somehow need to detect the borrowed reference dropping to zero and not free it, but instead just return it (solving issue #3). For #2 – after the yield the Base needs code to examine the slice and reconcile, much like that code you linked to does. Generalizing the fast path, where it detects nothing needs doing because it was mutated in place, is hard though for a generic Slice implementation.

That's not what the code I linked to is doing. As a byproduct of detecting uniqueness in the slice it is releasing references to array elements outside the slice's bounds.

Anyhow, this still doesn't explain how the proposal can work. You need to be able to store a copy of this slice somewhere in the escaping case, and that copy needs to be +1, and needs to release its reference when destroyed. I'm not sure what inout is actually supposed to mean in this context, but if it means “don't retain another reference,” that's not intrinsic to the base property of all instances; it's intrinsic to the context (slicing subscript) where this one instance was created¹ — the slice that escapes does need to retain another reference, and in every other way needs to act like something that doesn't have inout on its base.

As I've been saying, when this thing crosses a visibility boundary in the compiler, you need some way to know how to handle it when it's being mutated. Imagine f is a mutating method, opaque to the compiler.

var a = [ ... ]
a[3...7].f()      // 1
var b = a[3...7]
a = []
b.f()             // 2

In both cases 1 and 2, f should observe that the buffer is uniquely referenced. If f happens to reassign self, in case 1 it must not release the buffer but in case 2 it must release the buffer. Something needs to communicate the need for that different behavior into the body of f. The information is not present in the type system, so it has to be communicated dynamically.

The only other possibility is that f (dynamically) communicates the potential release back up to the call site, where it's known whether we're in an inout slicing context.

Either way, there's dynamic information needed.


¹ That's why Erik's proposed notation doesn't make sense to me.

2 Likes

Here is the explanation I promised. This post only relates to the soundness of the implementation in GitHub - dabrahams/DivideAndConquer: Demonstrates a fix for the mutating divide-and-conquer COW problem, which I'll refer to below as "DivideAndConquer". Responding in-line point-by-point won't converge in a useful place. Instead I wanted to deal with this specific aspect coherently in one place. This post has no bearing on other important issues, notably:

  • a path to generalizing the design to other collections

  • guaranteeing non-copying behavior at -Onone

  • incurring Array implementation complexity--the initial
    implementation costs are insiginificant relative to the long-term
    costs

Let's keep those issues in separate posts.

Fix for Divide-And-Conquer

+++ b/Sources/DivideAndConquer/ContiguousArray.swift

   public subscript(bounds: Range<Int>) -> ArraySlice<Element> {
     get {
       _checkIndex(bounds.lowerBound)
       _checkIndex(bounds.upperBound)
       return ArraySlice(_buffer: _buffer[bounds])
     }
     _modify {
       _checkIndex(bounds.lowerBound)
       _checkIndex(bounds.upperBound)

       let wasUnique = _buffer.isUniquelyReferenced()
       var rhs = ArraySlice(_buffer: _buffer[bounds])
+      // Any runtime function that the compiler can't make any
+      // assumptions about and that takes the address of 'rhs' will
+      // work here.
+      strongRefCount(&rhs)

Test Case Requiring Fix

@inline(never)
func testEscapeWithTripleReference(array: inout ContiguousArray<Int>) -> ArraySlice<Int> {
1:  return copyAndMutate(slice: &array[0..<4])
}

@inline(__always)
func copyAndMutate(slice: inout ArraySlice<Int>) -> ArraySlice<Int> {
  // Why was `slice` passed inout? Maybe it used to be
  // `slice[0..<1].reverse()` which is inlined to nothing.
2:  var sliceCopy = slice
3:  sliceCopy[0..<2].reverse()
4:  return sliceCopy
}

var array = DivideAndConquer.ContiguousArray(0..<5)
let slice = testEscapeWithTripleReference(array: &array)
assert(array[0] != slice[0])

Explanation of the Bug

I was not actually able to break the above test, even after much massaging. However, I believe SIL rules allow the case to break, and I think ARC optimization based on OSSA form after inlining would be sufficient.

Here is the sequence of operations, where 'owner' always refers to the CoW storage object:

  1. ContiguousArray.subscript(bounds:)_modify

a. wasUnique = array.isUnique() = true
b. 'array.owner.isBorrowed' = true
c. retain 'slice.owner'

  1. Copy ArraySlice

d. retain 'sliceCopy.owner'

  1. ArraySlice.subscript(bounds:)_modify

e. isDuallyReferenced() = false (refcount==3)
f. borrowedOwner = nil
g. .reverse() copies storage
h. borrowedOwner == nil

  1. return 'sliceCopy'

i. retain 'returnedSlice.owner'
j. release 'sliceCopy.owner'


Optimization: remove the retain/release of 'sliceCopy' based on proving all of these conditions:

  • 'sliceCopy' is never checked for uniqueness (or mutated in any way)
  • 'array' outlives 'sliceCopy'
  • 'array.owner' is immutable during the lifetime of 'sliceCopy'

Now we have:

a. wasUnique = array.isUnique() = true
b. 'array.owner.isBorrowed' = true
c. retain 'slice.owner'
d. reuse 'sliceCopy.owner'
e. isDuallyReferenced() = true (refcount==2)
f. release 'borrowedOwner' (refcount==1)
g. .reverse() performs in-place mutation
h. retain 'borrowedOwner' (refcount==2)
i. retain 'returnedSlice.owner' (refcount==3)
j. ignore 'sliceCopy.owner'

This clobbers the original array.

Explanation of the Bug Fix

ContiguousArray.subscript(bounds:)_modify now takes the address of the mutable ArraySlice that it yields. This guarantees that it's 'owner' reference is opaque to the compiler. I used a call to strongRefCount(&rhs), but let's generalize this concept to _blackHole(inout AnyObject). We need two guarantees:

  • The compiler assumes that '_blackHole' may release its inout argument.

  • The compiler assumes that '_blackHole' cannot be eliminated.

The compiler can now never eliminate this operation:

c. retain 'slice.owner'

..because the retained reference may immediately be released by '_blackHole'. Even after the call to '_blackHole', it cannot share the representation of 'array.owner' and 'slice.owner', because it has no way to prove that they are the same value.

Consequences of the Bug Fix

For ArraySlice rvalues:

  • Implementation: Array.subscript(bounds:).get with no corresponding writeback

  • Returned ArraySlice may or may not be retained.

  • ArraySlice storage is copied on mutation (after copying to an lvalue).

  • Unaffected by the Divide-And-Conquer.

For ArraySlice lvalues today

  • Implementation: Array.subscript(bounds:).get + Array.subscript(bounds:).set:

  • Returned ArraySlice is retained and its storage is copied whenever
    it is mutated (after optimization)

  • Returned ArraySlice may not be retained if it is not mutated
    e.g. array[0..<1].reverse() never mutates the slice

For ArraySlice lvalues with (fixed) DivideAndConquer

  • Implementation: Array.subscript(bounds:)._modify

  • Returned ArraySlice is retained whenever it is mutated (after optimization)

  • Storage is never copied unless the slice is copied.

  • Returned ArraySlice may be retained if it is not mutated
    e.g. array[0..<1].reverse() requires a retain and can't be fully eliminated.

DivideAndConquer trades-off an extra retain in a corner case (divide-and-conquer base case) for eliminating a copy of the storage in the common case (ignoring the other overhead of borrowed status checks).

Explanation of Compile-Time Reference Counting

Axiom 1: At a single program point, any two instances of the same value (identical values) can share a representation.

Theorem 1: for any reference-counted type, checking uniqueness requires apparent mutability of the reference. Otherwise, its reference may share its representation with other instances of the same value.

Commentary: The source of confusion around this stems from the variable scopes that appear evident at the source level. But the presence of those scopes do not subvert the rules above.

While it is true that these two forms may have different semantics in some resepects:

Form 1:

let x = foo()
bar(x)
_ = x

Form 2:
bar(foo())

Those semantics have no impact on whether the implementation can share the representations of foo's returned value, the value in variable x, and bar's argument. Simply put, a single reference will suffice. This is fundamental to the compiler's model of the language.

Axiom 2: A reference passed as an argument (whether owned or inout) may be released no more than once without at least one retain for every additional release.

ARC can be reduced to a counting problem where each operation has a net effect on its operand's reference count.

var obj: AnyObject

retain(obj):  +1
release(obj): -1
apply(obj):   -1..<infinite (possible owned convention)
apply(&obj):  -1..<infinite (reassignment is a release)

The only constraint on the compile-time implementation is proving that an object's reference count does not reach zero before its last use. It's the same exercise required for manual reference counting.

Theorem 2: The only fully general way to check refcount==2 is by simultaneously mutating two distinct references to the same object. For example:

Builtin.isDuallyReferenced(_ object1: inout AnyObject, _ object2: inout AnyObject) -> Bool {
  do {
    _precondition(object1 == object2)
  }
  return swift_isDuallyReference(Builtin.addressof(&object1),
                                 Builtin.addressod(&object2))
}

You may be thinking "can't the compiler have special magic operations".

Yes, the compiler can have all the magic it wants, but that does not change any of these facts. Those magic operations exist within APIs, and those APIs must adhere to the calling conventions of the language. The only way out of the rules I've put forth above would be a language level calling convention that provides for a single reference argument to be net released more than once (without corresponding retains). I'm comfortable stating that we will never have such a convention.

Axiom 3: for any CoW type, mutability implies uniquenes.

Theorem 3: From Theorem 1 and Axiom 3. For any CoW type, mutability and uniqueness are mutually implied. These conditions can be used interchangeably. (I find this useful to simply the discussion).

Let's apply these rules to DivideAndConquer.

The ArraySlice does not have access to the Array when checking isDuallyReferenced(). With access to only one reference, isDuallyReferenced() does not prove anything on its own.

To rely on isDuallyReferenced(), we must guarantee that ArraySlice "owns" a distinct reference to storage. That can be guaranteed if any of these statements are shown true:

  • the ArraySlice is checked for uniqueness (i.e. mutated)--and this
    check cannot be removed.

  • the ArraySlice lifetime is surpasses the Array's lifetime

  • the Array is checked for uniqueness within the ArraySlice's
    lifetime--and this check cannot be removed.

We do know that the Array will be accessed again after ArraySlice's lifetime. That does not help.

We do not otherwise have any control over Array. Therefore, the only option is forcing ArraySlice to be apprently mutated regardless of how much information the compiler has. Note that this is exactly equivalent to forcing the ArraySlice's owner reference to be copied, which is exactly equivalent to forcing a retain. A Builtin.forceRetain(inout AnyObject) would have the same effect.

Conclusion

DivideAndConquer is a working implementation of in-place ArraySlice mutation, given a fairly nonintrusive fix.

isDuallyReferenced(inout AnyObject), as written, should not be exposed as a user-level API.

4 Likes

Hey, thanks so much for writing all this down, and so clearly! Most of it makes complete sense to me and I have no questions or even nits to pick for the most part, but there are a couple, and I got lost towards the end…

I had to google OSSA and found this and this, neither of which is obviously final. Wish I had known about that document at the time, but in fairness I probably would have been too busy to absorb it. Is there an official version somewhere?

Theorem 1: for any reference-counted type, checking uniqueness requires apparent mutability of the reference.

Here I'd like to suggest we use the term apparent possible mutation (or just apparent mutation as you use later) of the reference since mutability is a type system concept. IIUC it's not the mutability of the reference in that sense you're talking about, but the inability for the compiler to prove that it isn't actually mutated, right?

Anyway, in light of how I understand(?) that, I got lost here:

Axiom 3: for any CoW type, mutability implies uniqueness.

It took me a long time to connect this “mutability” to the “apparent mutability” above, in which I hope I'm headed down the right path; before that I didn't have a clue. But even after I rephrase it as “apparent mutation of the reference” I can't make sense of it. It only seems axiomatically true if I read it as “mutation of the buffer referred to by the reference,” which is a very different thing and now I'm pretty sure I'm just guessing. Can you help me out with this one?

The only way out of the rules I've put forth above would be a language level calling convention that provides for a single reference argument to be net released more than once (without corresponding retains). I'm comfortable stating that we will never have such a convention.

Agreed, we won't. But it's not obvious to me that we couldn't come up with a new set of rules, provable to preserve the semantics of existing code code, while making some new things possible. I realize it's speculation on the order of “couldn't we solve this with some [magic] language feature?” but I would like to understand how fundamentally you hold your statement to be true.

Regarding your conclusions, they're pretty encouraging. The excess retain/release on a buffer that is mutable but never actually mutated under rather bizarre circumstances does not seem like it could have much impact on real programs. And, not that I'm counting on it, but this seems like one of those places where—with a different set of rules like what I referred to above—it might be possible for the compiler to prove that the the operations could be eliminated.

It's a little disturbing to me that under the current rules, the fix depends on an opaque, possibly-retaining, possibly-releasing, possibly-mutating, possibly-escaping use of the reference, and very disturbing that isUniquelyReferenced depends on the same opacity. It just seems intuitively like we must be robbing the compiler of more important information than necessary around these operations that are so crucial to performance and are in fact pure, and that it would better to give them some kind of special status in the rules. Am I wrong to be concerned?

Final question: when you wrote,

Let's keep those issues in separate posts.

Did you mean “in separate threads,” or would you be happy to deal with those issues in follow-up posts here?

Thanks again!
-Dave

Good!

but there are a couple, and I got lost towards the end…

Your questions don't sound like you were lost.

I had to google OSSA and found this and this , neither of which is obviously final. Wish I had known about that document at the time, but in fairness I probably would have been too busy to absorb it. Is there an official version somewhere?

If we had an official spec, it would be referred to from the SIL spec... but it's not there yet. The working draft you found is probably the closest thing.

I think I first introduced the term in this post from 2017.

Michael just gave a talk on OSSA at LLVM Dev. I see a link to the video but no slides.

Anyway, this is the same point I made earlier about the compiler failing to remove nested retains and releases to the same reference. It's a problem with the SIL representation that's gradually being fixed but requires migrating the entire optimizer. It's been an ongoing activity for a while. The timeline for a feature like this has absolutely nothing to do with the complexity of the feature and everything to do with the prior amount of technical debt and maintenance burden of existing code.

Theorem 1: for any reference-counted type, checking uniqueness requires apparent mutability of the reference.

Here I'd like to suggest we use the term apparent possible mutation (or just apparent mutation as you use later) of the reference since mut ability is a type system concept. IIUC it's not the mutability of the reference in that sense you're talking about, but the inability for the compiler to prove that it isn't actually mutated , right?

Correct. I meant mutation.

Anyway, in light of how I understand(?) that, I got lost here:

Axiom 3: for any CoW type, mutability implies uniqueness.

It took me a long time to connect this “mutability” to the “apparent mutability” above, in which I hope I'm headed down the right path; before that I didn't have a clue. But even after I rephrase it as “apparent mutation of the reference” I can't make sense of it. It only seems axiomatically true if I read it as “mutation of the buffer referred to by the reference,” which is a very different thing and now I'm pretty sure I'm just guessing. Can you help me out with this one?

Again, I should have said "mutation" here, not mutability. I'm speaking as if the "CoW value" is represented within its buffer. This is the definition of CoW, nothing more.

The only way out of the rules I've put forth above would be a language level calling convention that provides for a single reference argument to be net released more than once (without corresponding retains). I'm comfortable stating that we will never have such a convention.

Agreed, we won't. But it's not obvious to me that we couldn't come up with a new set of rules, provable to preserve the semantics of existing code code, while making some new things possible. I realize it's speculation on the order of “couldn't we solve this with some [magic] language feature?” but I would like to understand how fundamentally you hold your statement to be true.

This is a problem of communicating any magic up through the layers of the API. We sometimes deal with this by adding "@_semantic" tags to APIs. But...key point...even though we can imbue certain source constructs with extra semantics, all the disparate, shady parts of the compiler that don't know about those special semantics still need to function correctly. A bit like adding a new case to a library enum. So, adding a new ownership passing convention requries about the same level of implementation complexity regardless of how its surfaced.

Regarding your conclusions, they're pretty encouraging. The excess retain/release on a buffer that is mutable but never actually mutated under rather bizarre circumstances does not seem like it could have much impact on real programs. And, not that I'm counting on it, but this seems like one of those places where—with a different set of rules like what I referred to above—it might be possible for the compiler to prove that the the operations could be eliminated.

We're talking about different issues. Above, I'm explaining why isDuallyReferenced is not a suitable general purpose API unless it's rewritten to take two @inout arguments. You are right, it would be possible to invent some compiler magic to allow extra retain of your ArraySlice to be optimized. But, all the code that accesses the copied ArraySlice would need to be visible (e.g. fully inlined). Otherwise, we're stuck at the argument convention problem again.

It's a little disturbing to me that under the current rules, the fix depends on an opaque, possibly-retaining, possibly-releasing, possibly-mutating, possibly-escaping use of the reference, and very disturbing that isUniquelyReferenced depends on the same opacity. It just seems intuitively like we must be robbing the compiler of more important information than necessary around these operations that are so crucial to performance and are in fact pure, and that it would better to give them some kind of special status in the rules. Am I wrong to be concerned?

If the lowest-level compiler builtin is exposed, then yes the compiler can recognize it. That is why I did not simply propose adding a unused uniqueness check instead of the _blackHole (or any other as-of-yet unknown runtime call). Some piece of intelligence exists in the compiler that is smart enough to know that an unused uniqueness check can be removed. And a piece of intelligence says they can be hoisted, and combined, and so on, based on complicated rules. That doesn't come close to covering the complexity. The important optimizations happen at a higher level of semantics, "make-mutable-and-unique", which statically guarantees that subsequent checks can be eliminated...

But, again, there are all the other disparate, shady parts of the compiler, including parts that have not yet been written. If that code sees a value being passed to a call, and knows that the call cannot modify the value, then it can share the value's representation. That's fundamental to how values work.

To fix that problem, you would need to change SIL so that uniqueness checking is somehow part of the argument passing convention that every piece of code in the compiler needs to deal with. But, at the source level, whenever you want to write a wrapper around some uniqueness-checking function, you'll still need to take the reference @inout for correctness--otherwise you've just hidden the fact that the reference can't share its representation. So all the complexity of making SIL more "uniqueness aware" would not really change anything.

Of course, another valid approach to giving a value "apparent/potential mutation" is returning the new value, rather than mutating in-place: (newReference, isUniqe) = isUnique(originalReference). That approach has the disadvantage of potential false negatives because the reference may be copied, but I don't think it changes anything for the purpose of this discussion.

A completely different approach to this problem would be give retains stronger semantics at the SIL level. Currently, the compiler only needs to guarantee that objects are alive up to their last use. We could instead say that retains cannot be eliminated merely because they redundantly retain the same reference. To eliminate any retain, the compiler would need to prove that no code can observe the reference count between that retain and some corresponding "paired" release. We don't currently represent "paired" releases, but OSSA will fix that. This would be a poor model for ARC optimization for at least two reasons

  • References, like any value, can be arbitrarily "aliased" across many
    instances. So there's no way to determine whether something happens
    to a referenced object just by looking at the uses of that instance
    of the reference.

  • In general, we don't want to model a critical optimization such that
    it depends on proving a negative condition throughout arbitrarily
    complex code that is otherwise unrelated to the thing being
    optimized.

Final question: when you wrote,

Let's keep those issues in separate posts.
Did you mean “in separate threads,” or would you be happy to deal with those issues in follow-up posts here?

I meant that I wanted to bracket the scope of this post and all of its in-line responses. I feel like there's a threshold of understanding that needs to be reached before in-line responses will converge. It looks like we've reached that point regarding implementation correctness.

1 Like

No video link on that page, but Google found it here.

Yeah, my (mis-)understanding from talking with @Michael_Gottesman was that the ARC part of that project had been completely implemented.

OK, as to where I got confused, let me apply our corrections, a little more precision, and my own emphasis to what you've established:

  • Theorem 1: for any reference-counted type, checking uniqueness requires apparent mutation of the bits of the reference.
  • Axiom 3: for any CoW type, mutation of a part of its value accessed through the reference implies uniqueness.

I'm confused because then you go on to

Theorem 3: From Theorem 1 and Axiom 3. For any CoW type, mutability and uniqueness are mutually implied. These conditions can be used interchangeably. (I find this useful to simply the discussion).

I don't see how this theorem follows from the foregoing. It appears to be relying on the linguistic conflation of the bits of the reference with the value accessed through the reference to establish some kind of if and only if relationship. In fact, the two have an inverse relationship: if the contents of the buffer are going to be mutated, the bits of the reference won't, and vice-versa.

Another reason I'm having trouble seeing the connection is it seems to conflate “apparent mutation” with mutability. If you ignore the supposedly-supporting statements T1 an A3, which are about apparent mutation, and take T3's use of “mutability” as correct, T3 makes total sense. I just don't see how it rests on T1/A3 in any way.

It's not obvious to me that we couldn't come up with a new set of rules, provable to preserve the semantics of existing code code, while making some new things possible. I realize it's speculation on the order of “couldn't we solve this with some [magic] language feature?” but I would like to understand how fundamentally you hold your statement to be true.

This is a problem of communicating any magic up through the layers of the API. We sometimes deal with this by adding "@_semantic" tags to APIs. But...key point...even though we can imbue certain source constructs with extra semantics, all the disparate, shady parts of the compiler that don't know about those special semantics still need to function correctly. A bit like adding a new case to a library enum. So, adding a new ownership passing convention requries about the same level of implementation complexity regardless of how its surfaced.

OK, so this is “merely” a matter of implementation complexity and technical debt, rather than fundamental.

Regarding your conclusions, they're pretty encouraging. The excess retain/release on a buffer that is mutable but never actually mutated under rather bizarre circumstances does not seem like it could have much impact on real programs. And, not that I'm counting on it, but this seems like one of those places where—with a different set of rules like what I referred to above—it might be possible for the compiler to prove that the the operations could be eliminated.

We're talking about different issues. Above, I'm explaining why isDuallyReferenced is not a suitable general purpose API unless it's rewritten to take two @inout arguments.

I'm not sure what you mean by “we're talking about different issues.” I was referring to the parts of your conclusions where you say that DivideAndConquer can be made correct under the existing SIL rules with only a small change, and where you analyze what would happen with existing and recompiled binaries if we were to ship it as a change to the Array implementation. I do understand what you said about isDuallyReferenced, but “we're talking about different issues” seems to imply you think I was confused about something or conflating some things, and I don't see it.

You are right, it would be possible to invent some compiler magic to allow extra retain of your ArraySlice to be optimized. But, all the code that accesses the copied ArraySlice would need to be visible (e.g. fully inlined). Otherwise, we're stuck at the argument convention problem again.

Yes, since the retain is only extra in this corner case when there's no actual mutation, optimizing it out requires that the compiler can see that there's no actual mutation. I get that.

It's a little disturbing to me that under the current rules, the fix depends on an opaque, possibly-retaining, possibly-releasing, possibly-mutating, possibly-escaping use of the reference, and very disturbing that isUniquelyReferenced depends on the same opacity. It just seems intuitively like we must be robbing the compiler of more important information than necessary around these operations that are so crucial to performance and are in fact pure, and that it would better to give them some kind of special status in the rules. Am I wrong to be concerned?

[snip discussion of implementation complexity]

…At the source level, whenever you want to write a wrapper around some uniqueness-checking function, you'll still need to take the reference @inout for correctness--otherwise you've just hidden the fact that the reference can't share its representation. So all the complexity of making SIL more "uniqueness aware" would not really change anything.

I ask that we set implementation complexity aside for the moment because IMO we currently have a fairly weak foundation for interesting value semantics, and it seems as though incremental steps like building “uniqueness awareness” on that weak foundation could be far more complex than establishing a model that accounts for deeper semantic relationships.

I'm thinking “CoW and purity-awareness” is where we really want to be. For example, colleagues of mine were wondering why accessing a local let-bound Array's count in a closure results in the whole Array being captured. The fundamental reason, IIUC, is that the count is stored in the Array's buffer, which could be shared, and nothing tells the compiler that it acts just as though it were an independent value stored in the Array's shallow bits. Until those truths are encoded into SIL at a fundamental level, it seems inevitable that we'll have to chase all the “disparate, shady parts of the compiler that don't know about those special semantics.”

Of course, another valid approach to giving a value "apparent/potential mutation" is returning the new value, rather than mutating in-place: (newReference, isUniqe) = isUnique(originalReference) . That approach has the disadvantage of potential false negatives because the reference may be copied, but I don't think it changes anything for the purpose of this discussion.

Of course. And the potential for reference copying could be largely eliminated by annotations, but I agree it doesn't change anything. If the check could be written this way, where _isUnique was known by the compiler to be pure, it might change things a little (though I think it's awful):

let isUnique = withExtendedLifetime(oldReference) {
  let (newReference, r) = _isUnique(ObjectIdentifier(oldReference))
  oldReference = newReference
  return r
}

A completely different approach to this problem would be give retains stronger semantics at the SIL level. Currently, the compiler only needs to guarantee that objects are alive up to their last use. We could instead say that retains cannot be eliminated merely because they redundantly retain the same reference. To eliminate any retain, the compiler would need to prove that no code can observe the reference count between that retain and some corresponding "paired" release.

[snip why it wouldn't work]

Yeah, that seems way too low-level again.

Warning: compiler optimization amateur hour starts now
What about this: if we encode into SIL the idea of (independent) values that include bits accessible through references (i.e. CoW types), then you can say that retains can be eliminated for lifetime-dominated copies of a value when the compiler can prove the copies aren't potentially mutated. Among other things, that could allow the knowledge of immutability of lets and function parameters to be leveraged at the SILGen level, before optimization even starts.

1 Like

I want to clarify up front, I don't think I've ruled out the possibility of making the compiler more aware of CoW semantics. The only thing I wanted to absolutely rule out was exposing isDuallyReferenced as an independent API.

It's more that the new ARC work needs to be fully integrated. A whole lot of inlining and other optimizations need to take place before we see new ARC optimization handling the kind of complexity of the Array implementation... and those other optimizations currently destroy OSSA form.

I don't see how this theorem follows from the foregoing. It appears to be relying on the linguistic conflation of the bits of the reference with the value accessed through the reference to establish some kind of if and only if relationship. In fact, the two have an inverse relationship: if the contents of the buffer are going to be mutated, the bits of the reference won't, and vice-versa.

I'm mainly pointing out that checking uniqueness and buffer mutability always look the same at a regular call site.

It is true that apparent mutation of the reference bits does not need to imply mutation of the storage. I'm saying that implication is present without any special information at a call site.

The point that I keep trying to make unsuccessfully is that we can't transitively apply additional semantics upward when those semantics place additional requirements on the caller:

@_semantics("I don't really mutate any bits!")
func checkUniqueness(a: @inout CoWContainer) -> Bool

func abstraction(a: @inout CoWContainer) -> Bool {
  return checkUniqueness(&a)
}

At the point of calling abstraction, the compiler cannot see implementation and cannot distinguish between apparent mutation of the reference bits vs. mutation of the storage.

And the inverse, pretending we had a "move":

@_semantics("I observe a reference count!")
func checkUniqueness(a: AnyObject) -> Bool 

func abstraction(a: AnyObject) -> Bool {
  return checkUniqueness(move(a))
}

At the point of calling abstraction, the compiler cannot see implementation and cannot know that a's representation can't be shared with another reference.

Another reason I'm having trouble seeing the connection is it seems to conflate “apparent mutation” with mutability. If you ignore the supposedly-supporting statements T1 an A3, which are about apparent mutation, and take T3's use of “mutability” as correct, T3 makes total sense. I just don't see how it rests on T1/A3 in any way.

I hope I answered this above. I understand that your mutability has to do with whether we see a local variable or class property declared 'let'. Within the compiler's implementation of CoW, I think of "mutability" as the state of the CoW storage after uniqueness checking and buffer copying, which only happens during the process of mutation. But I'll try not to mix those up anymore.

This is a problem of communicating any magic up through the layers of the API. We sometimes deal with this by adding "@_semantic" tags to APIs. But...key point...even though we can imbue certain source constructs with extra semantics, all the disparate, shady parts of the compiler that don't know about those special semantics still need to function correctly. A bit like adding a new case to a library enum. So, adding a new ownership passing convention requries about the same level of implementation complexity regardless of how its surfaced.

OK, so this is “merely” a matter of implementation complexity and technical debt, rather than fundamental.

Well, it's also a matter of ABI. And, for the record, I can't think of anything I would change anything about our argument conventions starting over. I think what we're just dancing around here is that we need the type system to communicate CoW semantics to the compiler. If the compiler knows that the value being passed is or is not "CoWish" it could make extra assumptions about what happens on the other side.

I'm thinking “CoW and purity-awareness” is where we really want to be. For example, colleagues of mine were wondering why accessing a local let-bound Array's count in a closure results in the whole Array being captured. The fundamental reason, IIUC, is that the count is stored in the Array's buffer, which could be shared, and nothing tells the compiler that it acts just as though it were an independent value stored in the Array's shallow bits. Until those truths are encoded into SIL at a fundamental level, it seems inevitable that we'll have to chase all the “disparate, shady parts of the compiler that don't know about those special semantics.”

Interesting problem. It would be possible with the right builtins and optimizer magic to make this problem go away in some cases. But I typically fall on the side of designing for predictable behavior. Whenever CoW behavior is at stake, we should look for solutions that pop out of the type system, and don't involve whole program analysis. This what @Ben_Cohen meant by a mandatory (-Onone) solution.

Of course. And the potential for reference copying could be largely eliminated by annotations, but I agree it doesn't change anything. If the check could be written this way, where _isUnique was known by the compiler to be pure, it might change things a little (though I think it's awful):

let isUnique = withExtendedLifetime(oldReference) {
let (newReference, r) = _isUnique(ObjectIdentifier(oldReference))
oldReference = newReference
return r
}

Once @Erik_Eckstein fixes this, the optimizer will convert checks to this form behind-the-scenes whenever possible, where the uniqueness check will look like a pure function, even to all the disparate shady compiler parts.

What about this: if we encode into SIL the idea of (independent) values that include bits accessible through references (i.e. CoW types), then you can say that retains can be eliminated for lifetime-dominated copies of a value when the compiler can prove the copies aren't potentially mutated. Among other things, that could allow the knowledge of immutability of lets and function parameters to be leveraged at the SILGen level, before optimization even starts.

I think we should surface more CoW information to enable additional "mandatory" optimizations. I only push back when it looks like we need to change semantics in situations where the compiler doesn't have that CoW information.

I can think of two approaches to add more CoW awareness in SIL.

#1 New compiler builtins within the CoW container implementations and @semantic attributes

Here I get hung up on the fact the argument conventions limit any semantics that we want to apply transitively across API levels.

When passing references, the type system and argument conventions alone don't convey what can happen to the referent... whether it's refcount is observed or its storage mutated.

We have a similar problem with purity awareness. It would have been nice if some level of function purity was the default language semantic. Instead we rely very heavily on @inlinable attributes and whole-module analysis.

We can still do well with this approach at -O with careful attention to inlining. Using new builtins, @Erik_Eckstein is designing intelligence that allows the optimizer to know that CoW buffers are effectively immutable, when not being locally mutated. It should work as long as the CoW implementation is sufficiently inlined.

But... when CoW storage copies are at stake, we should design for predictability and guaranteed, -Onone behavior. This approach isn't good enough for mandatory optimizaton.

#2 Surface information about CoW data types in the type system

I do think surfacing type information is promising for mandatory optimization as long as it doesn't depend on changing something fundamental about values and argument conventions.

Whenever it has full type information (no properties hidden behind libraries) it would be possible for the compiler to understand CoW semantics at any level of the API. But, we have generic types and library types. So, I get hung up for the same reason.

In both approaches we need to accept that the compiler doesn't always know it's dealing with a CoW type.

We will need to define new argument passing conventions and special value semantics for move-only types. The move-only atomics case is similar in that we'll have something that appears immutable in its declaration, but is pinned to an address, and can apparently mutate at SIL level. But that all hinges on making any containing type conform to a special protocol... and that won't happen for CoW types.

So, I imagine there are lots of things we can improve with CoW semantics. All I know is that relying on a "dual-reference" check requires apparent mutation of the ArraySlice's reference, and that apparent mutation could probably be optimized away with compiler magic when we see there is no actual mutation.

1 Like

Since you both begin and end with that point, I guess I need to clarify up front: I have absolutely no interest in exposing isDuallyReferenced as a public API, or even in using it internally. The near as I can tell, the whole system can be built using just the extra bit in the ABI. isDuallyReferenced just happens to be what makes things simple with the current state of ARC, and will be a suboptimal way to handle it when ARC gets smarter. Ideally, we use the extra bit to denote that we can't deallocate the buffer or write outside the slice even when it's uniquely referenced.

A whole lot of inlining and other optimizations need to take place before we see new ARC optimization handling the kind of complexity of the Array implementation... and those other optimizations currently destroy OSSA form.

I'll take your word for it for now. My (fallible) instinct tells me that if we make the information in SIL high-level enough, the only needless CoWs left to eliminate with inlining are the rare cases where the guaranteed calling convention needlessly holds a reference across the mutation of a copy of a function parameter.

I'm mainly pointing out that checking uniqueness and buffer mutability always look the same at a regular call site.

Sorry, I'm still having trouble pinning down what you mean here. There is no check for buffer mutability other than uniqueness, so of course checking for one looks like checking for the other. That seems too obvious to be what you are saying, so I must be missing something.

Of course it can't. The semantic annotation isn't present in the signature of abstraction, so the distinction is lost. The same goes for passing information downward in the call chain. It should be clear that I understand these points if you've been reading my arguments about why fixing this problem means encoding some information dynamically. Or are you making some other point that I'm missing?

Another reason I'm having trouble seeing the connection is it seems to conflate “apparent mutation” with mutability. If you ignore the supposedly-supporting statements T1 an A3, which are about apparent mutation, and take T3's use of “mutability” as correct, T3 makes total sense. I just don't see how it rests on T1/A3 in any way.

I hope I answered this above.

I'm afraid not, but maybe it's not important to see how you get to T3. If all it means is that “the buffer is eligible for mutation if and only if it is unique” then I'm happy to take that as an axiom.

I understand that your mutability has to do with whether we see a local variable or class property declared 'let'. Within the compiler's implementation of CoW, I think of "mutability" as the state of the CoW storage after uniqueness checking and buffer copying, which only happens during the process of mutation. But I'll try not to mix those up anymore.

It's not “my mutability,” it's the langage's mutability. When I used the name “makeMutableAndUnique” in the standard library I was writing more casually and I had the benefit of comments to explain what I meant. For this discussion, we need clear terminology, and the language has already defined “mutable” so let's come up with words for what you mean. Best I can do is above, “eligible for mutation,” or maybe “dynamically mutable.” Any other ideas?

I think what we're just dancing around here is that we need the type system to communicate CoW semantics to the compiler.

Yes please; it's only what I've been asking for since 2013 :wink:.

Then we're in violent agreement; I've been saying we should represent value semantics in the type system for years. I know what Ben meant and I always agreed we should aim for that.

Once @Erik_Eckstein fixes this, the optimizer will convert checks to this form behind-the-scenes whenever possible, where the uniqueness check will look like a pure function, even to all the disparate shady compiler parts.

:+1:

I think we should surface more CoW information to enable additional "mandatory" optimizations. I only push back when it looks like we need to change semantics in situations where the compiler doesn't have that CoW information.

I'm afraid I don't know what you mean by “change semantics”

I can think of two approaches to add more CoW awareness in SIL.

#1 New compiler builtins within the CoW container implementations and @semantic attributes

To me, it seems obviously too low level again.

#2 Surface information about CoW data types in the type system

I do think surfacing type information is promising for mandatory optimization as long as it doesn't depend on changing something fundamental about values and argument conventions.

Whenever it has full type information (no properties hidden behind libraries) it would be possible for the compiler to understand CoW semantics at any level of the API. But, we have generic types and library types. So, I get hung up for the same reason.

Sorry to pick a nit, but it's not “CoW semantics,” it's “value semantics.” As @Joe_Groff has made clear, that's just an informal way to talk about purity. I hope we haven't locked ourselves into an ABI corner where we have to write off the ability to represent purity; if we have, I'll need to start thinking about the next version of Swift's ABI and how we're going to get there. But it seems to me that if we can retrofit move-only types, we ought be able retrofit purity.

In both approaches we need to accept that the compiler doesn't always know it's dealing with a CoW type.
We will need to define new argument passing conventions and special value semantics for move-only types. The move-only atomics case is similar in that we'll have something that appears immutable in its declaration, but is pinned to an address, and can apparently mutate at SIL level. But that all hinges on making any containing type conform to a special protocol... and that won't happen for CoW types.

…because?

If we're talking ABI changes, one thing that would be of general benefit for value types would be a "copy on escape" calling conventions, where functions that receive value-type arguments would presume them to be allocated on the stack and be required to implicitly copy them to the heap if they want to extend the lifetime of the value for any reason (which would be the equivalent of a retain for a buffer that is already on the heap), similar to how blocks in Objective-C must be _Block_copy-ed if they are escaped. This would allow escape analysis and stack promotion to be much more aggressive. However, it might also be something we could apply to the in-place-slicing problem—if we treat the slice value yielded to a mutation as being copy-on-escape, and we consider operations that would force a realloc of the buffer to be escapes, then that seems like it would allow the original slice value to be a plain in-place borrow of the original buffer, since any attempt to escape or grow the slice would trigger a copy as well.

4 Likes

How is this different from what the change from owned to guaranteed calling convention accomplished?

The owned-to-guaranteed convention only affects the inline storage of the type. If we pass an Array value as a guaranteed argument, then the callee has to copy the reference to escape it, but it doesn't currently need to also trigger a copy of the buffer the reference points at. The parallel you raise is insightful, though; copy-on-escape would be in a sense extending the +0/+1 ownership convention from the inline storage transitively to its COWed buffers.

1 Like

I guess I'm still confused. If the value escapes, it gets copied. If it contains a reference, the copy implies a retain. Then the buffer isn't uniquely-referenced anymore and can't be deallocated or written on. Why would we resort to eagerly copying the contents of the buffer?

1 Like

In the case of non-mutating arguments, it's a matter of optimization tradeoffs. It's likely that most value-semantics don't escape their arguments in the common case, so it would be nice to be able to stack-allocate buffers in more cases. Forcing the few functions that do make escaping copies of their arguments to copy the buffer would allow other code to perform better. I haven't thought as much about the tradeoffs when it comes to in-place mutating slices, but it's interesting that a similar technique might be able to make in-place slicing work without changing the surface language model.

1 Like

Okay, I do see a distant relationship to the topic at hand, but it mostly seems entirely different to me. For one thing, it's not related to mutation.

snip a bunch of my crazy but irrelevant ideas about how this could be wedged into the ABI

Agreed, it's interesting.

Let me ask you, though: wouldn't getting this to be reliable across visibility boundaries, wouldn't it require encoding "this parameter escapes" pervasively in function/method signatures?

My goal for this discussion was to make the compile-time model for reference counting clear enough that we could reach an understanding about why isDuallyReferenced is only very optimistically usable by a reference count implementation expert, and in your case, why the _blackHole fix is needed for correctness.

I think we're past that, but now side-tracked by my claim that uniqueness checking requires apparent mutation of the reference bits--at least if you travel far enough up the API where arbitrary compiler magic within the standard library no longer reaches. I'm standing by that claim, short of either

  • changing the compile-time model of reference counting in a way that's sure to inhibit optimization
  • or making it impossible to wrap a CoW type within arbitrary struct without forcing it to conform to some CoWContainer protocol. That's a language change I'm not ready to debate.

Our energy is now better spent on how to generalize a solution to the in-place slice mutation problem for any Collection. In my mind, the approaches for in-place slice mutation fall into one of these strategies...

  1. Statically enforced in-place-only mutation via a new slice type

Such a slice cannot be reassigned, is not RangeReplacable, and cannot even be copied (normally) without copying the buffer storage.

This would (I think) be sufficient for typical Divide-And-Conquer algorithms. It would cause some confusion when we talk about mutability of a CoW value. The value's reference bits would be immutable, but its storage would be mutable.

I imagine it would be impossible to integrate with existing collections, and I suspect you aren't interested in something like this. I don't really want to discuss it further. I only bring this approach up to clearly distinguish it from the other approaches...

  1. Copy-on-escape values

This requires somehow surfacing a distinction, as part of a variable's attribute, between a "locally pinned" CoW object vs. a "persistently addressable" CoW object.

A new argument convention would allow passing locally pinned objects, presumably using a bitwise copy that doesn't retain. Copying a locally pinned object would eagerly copy the buffer storage into persistently addressable memory.

This sounds exciting for optimizing local allocation. The design and implementation complexity is unknown to me. I'm concerned that it falls apart in practice because it requires a viral rewrite of function signatures.

  1. Dynamically retained in-place slice mutation

This is what some people are now calling "@inout properties". I think it is the generalized form of what you implemented.

As you have correctly argued, we cannot support in place slice mutation and allow the slice's storage to be reallocated/reassigned without some dynamic bit controlling a potential extra release when mutation ends.

I think of this dynamic bit as indicating that a property is in some +0 state. You could say that the property is a "borrowed reference".

If we had such a +0 bit, then all we need are two explicit compiler or stdlib builtins: one that performs the +0 borrow and one that performs a +0 check. And we need type system and runtime support for recognizing +0 values when they are copied or destroyed.

Naturally, copying the borrowed reference needs to produce a new reference without that bit set, and destroying the borrowed reference would not perform a release.

Everything else is handled by existing uniqueness checks. If the buffer storage is replaced, then the +0 bit will naturally no longer be set, and the original storage will be released at the end of the mutation scope after an explicit check for the +0 bit.


I think others will be better able to discuss the design in more detail, how to surface these things in the language, and how to handle ABI breaks.

3 Likes

Yes

No, we're not side-tracked by that. Many messages ago I wrote that I accept that's how the system works right now, and I understood at that point why it works that way. It's a natural artifact of the system having been focused on dealing with the issues of ordinary reference-counted class instances rather than the special properties of mutable slicing. And I'm not criticizing that, either, in case it's not obvious. I did say it forms a pretty weak foundation for getting the semantics we want from CoW types, and I still believe that.

That's fine; I think it debating it is out of scope here. To be clear for when we come back to this area, I'm not interested in the first bullet. I am interested in a variant of the second one, where I think a workable model is based on propagating “value semantics” (actually purity) rather than “CoW,” and doing so implicitly for structs except where operations are explicitly annotated as impure.

Oh, but I am very interested in something like this. When considering this problem I was lamenting that we had made slices range replaceable until I realized there was no way to suppress assignment. I didn't start here because it's a source-breaking change and because it requires language features (non-assignability) that we don't currently have, so I don't view it as something that can be done in the short term. Are you sure you don't want to discuss it?

Specifically, what I think would be interesting is that lvalues produced via range subscript are just such a non-assignable non-resizable instances (hah, that's ironic; a non-assignable lvalue!), and that you get something like what Slice is today otherwise. I think it's important that we can do things like make arrays of slices and I suspect non-assignability would make that impossible.

Unless you do it dynamically, the cost of which I think could hide in the noise. I started to write this in my response to Joe, but then deleted it:

• Reserve a magic refcount value to mean "I'm allocated on the stack and you can find a pointer to code for moving me to the heap just before me on the stack.”
• Make retains of escaping values recognize the magic refcount and do the copy.
• Obviously requires an ABI change to write back the copied reference.
• Does this extend beyond CoW buffers to regular class instances in the presence of multithreading? I'm not sure.

Glad someone has finally acknowledged that! It's hard to tell if some people are still clinging to the idea that the impossible can be achieved purely through language features. As long as that's the case, there's no hope for a solution.

That said, I don't see what a language feature can add to the story here, or how any language feature produces a generalization of what I implemented.

I'm pretty sure what you're outlining here is equivalent to what I've been saying will work provided the compiler “does its job,” with the advantage of guaranteeing that the job gets done :grinning:.

IIUC,

• +0 borrow is used to create a +0 copy of the reference for the slice. It sets the bit but doesn't claim a reference, preserving uniqueness.
• Releasing a reference with a refcount of 1 but the +0 bit set is a no-op, preventing slice assignment from deallocating the buffer.

But some things are unclear still:

Wait, are you suggesting the bit is part of the reference representation (e.g. the low address bit) and not stored in the memory it points to (e.g. with the refcount)? Or, is the other way around and you actually meant “copying the buffer pointed to by the borrowed reference” (in which case it makes sense that the two references see a different +0 bit)? I suspect the former, since you didn't mention the need for an "unborrow" primitive, but then your “explicit check for the +0 bit mentioned below” seems like it would only be needed if it was the latter.

If you're talking about storing the bit in the reference itself, I'm not 100% sure about the tradeoffs, since you have to strip the bit to use or copy the pointer, but it does potentially save an atomic operation for unborrow.

I still think this code could still be a problem if it's been compiled into clients.

2 Likes

I think a workable model is based on propagating “value semantics” (actually purity) rather than “CoW,” and doing so implicitly for structs except where operations are explicitly annotated as impure.

I'm on board for purity is an opt-in attribute for both functions and types (too late for opt-out!). I expect pure types to be contained within impure types though. So, when the compiler sees a pure type, it can make additional assumptions, but it can't assume values of that type will always be handled by the compiler differently from normal values--there will be times the compiler can't see that the value is pure based on its static type.

Specifically, what I think would be interesting is that lvalues produced via range subscript are just such a non-assignable non-resizable instances (hah, that's ironic; a non-assignable lvalue!), and that you get something like what Slice is today otherwise. I think it's important that we can do things like make arrays of slices and I suspect non-assignability would make that impossible.

Oh, it is interesting that we have the same view on this. Swift will probably need to support an immutable (non-reassignable) pass-by-address argument convention for other situations where copies must be avoided. I've heard people refer to this as "@inout lets". So I don't think reassignability is a showstopper. I think it's just design work that we haven't had time for and largely overlaps with move-only types.

I don't know the feasibility of redesigning the Collection APIs around some new non-RangeReplacable slice.

Wait, are you suggesting the bit is part of the reference representation (e.g. the low address bit) and not stored in the memory it points to (e.g. with the refcount)? Or, is the other way around and you actually meant “copying the buffer pointed to by the borrowed reference” (in which case it makes sense that the two references see a different +0 bit)? I suspect the former, since you didn't mention the need for an "unborrow" primitive, but then your “explicit check for the +0 bit mentioned below” seems like it would only be needed if it was the latter.

I'm talking about storing a bit alongside the reference, just as you have done, but potentially using the spare bits via an enum.

The "runtime support" is a custom value-witness for copy and destroy. The copy resets the borrowed bit in the copied value. The destroy does not release the borrowed reference.

The +0 check is only needed at the end of Array.subscript._modify to release the original storage in the event that the borrowed reference was replaced. But that's probably equivalent to checking the identity of the reference, so it's nothing new.