Solving the mutating slice CoW problem

This is not a debatable sliding scale. It needs to be a feature that is guaranteed even at -Onone, via language mechanics and mandatory optimization passes, for it to be considered solved and act as a basis for how to design collection APIs in the standard library and beyond. It is not something that can just be an optimization possibility in -O that may or may not happen (even if it happens "most of the time"), or something that only works on non-resilient types with fully inlinable implementations.

Again, how modify works should be the guide here. Elements are yielded at +0, and this is guaranteed by the language and happens even in -Onone.

That's not to say we can't experiment, and even land improvements incrementally as a nice-to-have property of certain types (though we need to be careful to explain that it doesn't generalize, similar in nature to indices being a performance trap when moving to generic code). But there's a clear line at which point we can use it to design APIs, and that point is that there is a clear guaranteed model for when divide-and-conquer slice mutation will not trigger CoW, ever, if you follow a simple pattern when writing a collection, including building on top of other collections generically.

(Additionally, since the "win" of implementing it only for specific types is so much lower than solving the problem in full, the bar for avoiding complexity in the optimizer is higher, and the lengths to which we might go to work around the ABI problems diminish)

Isn't it a little presumptuous to unilaterally declare anything non-debatable? Regardless, I obviously disagree. There is a huge potential community of users for Swift that do not care at all about resilience or non-inlinable implementations, and if we fail to serve them well, Swift will not grow significantly beyond Apple platforms. Speaking for my project, which falls squarely into the domain of HPC, there are many places we'd be willing to accept some performance cliffs in -Onone if it allows us to ship a cleaner, more elegant API that performs optimally at -O. In our case, we're competing with Python for code cleanliness, so the bar is extremely high. Another way to look at it: if we had to worry about all of the potential performance cliffs that exist in Swift today, my project would have to give up on Swift, because we'd never be able to deliver sufficiently high-level abstractions.

Whether the standard library has to start exposing inelegant APIs is a separate issue, but given that all the important parts of the standard library are actually non-resilient, it's not at all obvious to me, especially since the extent of ARC optimizations performed at -Onone is completely fungible.

So, while I agree that an ideal solution has all the properties you cite, it's very debatable how far we need to get to produce substantial value.

2 Likes

Going back to the question about missing language features.

Could the issue be that we want Slice to sometimes represent a separate independent value, and sometimes a mutable borrow that passes writes through?

@Michael_Gottesman mentioned local inout variables. What about stored inout properties? I think a stored inout property could capture the notion of a mutable borrow of the original collection. Therefore, we could implement a slice that passes mutations through to the original collection and is capable of being fully reassigned as:

enum MutableSlice<Base: Collection> {
  case Original(base: inout Base)
  case Replaced(base: Base)
}

I don't yet have a good idea of how to connect it to existing subscript that returns a Slice, which definitely represents a separate value today. We could add another subscript (distinguished by an argument label) to Collection that returned this MutableSlice.

Of course, in order for an instance of this MutableSlice to be safely used, we should enforce the exclusivity rule. IDK if it would be possible without significantly increasing the complexity of the planned ownership model.

3 Likes

I'm against the implementation in the runtime because it adds much complexity. I can remember how much a relief it was when we eventually could remove pinning. It would be really unfortunate if we would re-add something like that.

I like @gribozavr's idea. I'm not sure why "fully reassign" is actually needed (the compiler could prevent it). It would be awesome if a slice could be represented very cleanly as

struct MutableSlice<Base: Collection> {
  let base: inout Base
  let start: Base.Index
  let end: Base.Index
}

But I didn't think that through and maybe I'm missing something.

1 Like

To be able to mutate the base via the slice like this, you need to handle two things: the wipeout of the slice, and the possible escape of the slice. In both cases, I think it's a question of detecting it has happened and incrementing the reference counts in the base at that point to protect the it (in the case of wipeout: not leading to releasing the base when it should continue to exist; and in the case of escape, preserving the value semantics of the base). If you do that, I don't think you need to worry about preventing reassignment to the slice (which I don't think is possible – it's pretty baked into swift that you can always do that).

Hi Dmitri!

While that's true, I don't think it captures the reasons for our problem. I think what you just described is a characteristic of anything that can be written through a property or subscript. For example, here's a String property doing that:

struct X { var s = "abcde" }
var x: X

// `String` acts as independent value
var t = x.s
t.removeLast() // doesn't affect x

// `String` acts as “mutable borrow” that passes writes through.
x.s.removeLast()  

Making it performant with CoW

When the target of the write is implemented with CoW, getting good performance depends on the property/subscript access not prematurely claiming an additional reference. When the target is already stored in memory somewhere, the compiler happens to oblige us by not claiming that reference, provided the accessor is either:

  • yielding the target's address from a _modify, or
  • is the accessor synthesized by the compiler for regular stored properties

The issue for slicing…

…is two^H^Hhreefold: [EDITED]

  1. When you slice an array, the Slice instance isn't already in memory anywhere, so it has to be constructed on the stack. At least today, that construction eagerly claims an additional reference to the buffer.

  2. If we could avoid claiming a reference for the yielded slice, when a slice is reshaped:

     a[3...7].removeLast()
    

    there is code that will deallocate elements belonging to the array if it finds the slice to be uniquely-referenced. I'm not sure how much of that code is compiled into clients already, but that could be a problem if the only test is uniqueness.

  3. If you could avoid claiming a reference for the yielded slice, you'd still need to prevent the array buffer from being deallocated in situations like this

     extension ArraySlice { 
       mutating func reassign() { self = ArraySlice() } 
     }
     a[3...7].reassign()
    

Years ago, before we ever got pinning as it was eventually formulated, @Joe_Groff and I designed a similar system that solved the problems of this thread, but that design wasn't adopted. I suggest to you that while it may have been a relief to you to see the bit gone, the bit (or something equivalent) was always needed to produce the semantics we want.

If you want to solve the problem without an extra bit you need to deal with all three parts of the issue for slicing. Solving the first part (the extra reference) would mean a way for the slicing subscript to initialize and destroy that inout Base without retaining it. However, I don't know how anyone can solve the other two parts of the problem, especially the third part, if there's only a single reference.

But I didn't think that through and maybe I'm missing something.

I mean, you're not missing anything in saying “it would be awesome.” The problem is making that feature do what's needed. As far as I can tell, I've proven that you can't solve the problem using only uniqueness tests and no additional bits.

Yes, the idea is this is a new representation that would take a "borrowed" copy of the Base. That borrow would mean a +0 copy (solving your issue #1), and would need to be scoped to reserve exclusive access of the Base. It would somehow need to detect the borrowed reference dropping to zero and not free it, but instead just return it (solving issue #3). For #2 – after the yield the Base needs code to examine the slice and reconcile, much like that code you linked to does. Generalizing the fast path, where it detects nothing needs doing because it was mutated in place, is hard though for a generic Slice implementation.

That's not what the code I linked to is doing. As a byproduct of detecting uniqueness in the slice it is releasing references to array elements outside the slice's bounds.

Anyhow, this still doesn't explain how the proposal can work. You need to be able to store a copy of this slice somewhere in the escaping case, and that copy needs to be +1, and needs to release its reference when destroyed. I'm not sure what inout is actually supposed to mean in this context, but if it means “don't retain another reference,” that's not intrinsic to the base property of all instances; it's intrinsic to the context (slicing subscript) where this one instance was created¹ — the slice that escapes does need to retain another reference, and in every other way needs to act like something that doesn't have inout on its base.

As I've been saying, when this thing crosses a visibility boundary in the compiler, you need some way to know how to handle it when it's being mutated. Imagine f is a mutating method, opaque to the compiler.

var a = [ ... ]
a[3...7].f()      // 1
var b = a[3...7]
a = []
b.f()             // 2

In both cases 1 and 2, f should observe that the buffer is uniquely referenced. If f happens to reassign self, in case 1 it must not release the buffer but in case 2 it must release the buffer. Something needs to communicate the need for that different behavior into the body of f. The information is not present in the type system, so it has to be communicated dynamically.

The only other possibility is that f (dynamically) communicates the potential release back up to the call site, where it's known whether we're in an inout slicing context.

Either way, there's dynamic information needed.


¹ That's why Erik's proposed notation doesn't make sense to me.

2 Likes

Here is the explanation I promised. This post only relates to the soundness of the implementation in GitHub - dabrahams/DivideAndConquer: Demonstrates a fix for the mutating divide-and-conquer COW problem, which I'll refer to below as "DivideAndConquer". Responding in-line point-by-point won't converge in a useful place. Instead I wanted to deal with this specific aspect coherently in one place. This post has no bearing on other important issues, notably:

  • a path to generalizing the design to other collections

  • guaranteeing non-copying behavior at -Onone

  • incurring Array implementation complexity--the initial
    implementation costs are insiginificant relative to the long-term
    costs

Let's keep those issues in separate posts.

Fix for Divide-And-Conquer

+++ b/Sources/DivideAndConquer/ContiguousArray.swift

   public subscript(bounds: Range<Int>) -> ArraySlice<Element> {
     get {
       _checkIndex(bounds.lowerBound)
       _checkIndex(bounds.upperBound)
       return ArraySlice(_buffer: _buffer[bounds])
     }
     _modify {
       _checkIndex(bounds.lowerBound)
       _checkIndex(bounds.upperBound)

       let wasUnique = _buffer.isUniquelyReferenced()
       var rhs = ArraySlice(_buffer: _buffer[bounds])
+      // Any runtime function that the compiler can't make any
+      // assumptions about and that takes the address of 'rhs' will
+      // work here.
+      strongRefCount(&rhs)

Test Case Requiring Fix

@inline(never)
func testEscapeWithTripleReference(array: inout ContiguousArray<Int>) -> ArraySlice<Int> {
1:  return copyAndMutate(slice: &array[0..<4])
}

@inline(__always)
func copyAndMutate(slice: inout ArraySlice<Int>) -> ArraySlice<Int> {
  // Why was `slice` passed inout? Maybe it used to be
  // `slice[0..<1].reverse()` which is inlined to nothing.
2:  var sliceCopy = slice
3:  sliceCopy[0..<2].reverse()
4:  return sliceCopy
}

var array = DivideAndConquer.ContiguousArray(0..<5)
let slice = testEscapeWithTripleReference(array: &array)
assert(array[0] != slice[0])

Explanation of the Bug

I was not actually able to break the above test, even after much massaging. However, I believe SIL rules allow the case to break, and I think ARC optimization based on OSSA form after inlining would be sufficient.

Here is the sequence of operations, where 'owner' always refers to the CoW storage object:

  1. ContiguousArray.subscript(bounds:)_modify

a. wasUnique = array.isUnique() = true
b. 'array.owner.isBorrowed' = true
c. retain 'slice.owner'

  1. Copy ArraySlice

d. retain 'sliceCopy.owner'

  1. ArraySlice.subscript(bounds:)_modify

e. isDuallyReferenced() = false (refcount==3)
f. borrowedOwner = nil
g. .reverse() copies storage
h. borrowedOwner == nil

  1. return 'sliceCopy'

i. retain 'returnedSlice.owner'
j. release 'sliceCopy.owner'


Optimization: remove the retain/release of 'sliceCopy' based on proving all of these conditions:

  • 'sliceCopy' is never checked for uniqueness (or mutated in any way)
  • 'array' outlives 'sliceCopy'
  • 'array.owner' is immutable during the lifetime of 'sliceCopy'

Now we have:

a. wasUnique = array.isUnique() = true
b. 'array.owner.isBorrowed' = true
c. retain 'slice.owner'
d. reuse 'sliceCopy.owner'
e. isDuallyReferenced() = true (refcount==2)
f. release 'borrowedOwner' (refcount==1)
g. .reverse() performs in-place mutation
h. retain 'borrowedOwner' (refcount==2)
i. retain 'returnedSlice.owner' (refcount==3)
j. ignore 'sliceCopy.owner'

This clobbers the original array.

Explanation of the Bug Fix

ContiguousArray.subscript(bounds:)_modify now takes the address of the mutable ArraySlice that it yields. This guarantees that it's 'owner' reference is opaque to the compiler. I used a call to strongRefCount(&rhs), but let's generalize this concept to _blackHole(inout AnyObject). We need two guarantees:

  • The compiler assumes that '_blackHole' may release its inout argument.

  • The compiler assumes that '_blackHole' cannot be eliminated.

The compiler can now never eliminate this operation:

c. retain 'slice.owner'

..because the retained reference may immediately be released by '_blackHole'. Even after the call to '_blackHole', it cannot share the representation of 'array.owner' and 'slice.owner', because it has no way to prove that they are the same value.

Consequences of the Bug Fix

For ArraySlice rvalues:

  • Implementation: Array.subscript(bounds:).get with no corresponding writeback

  • Returned ArraySlice may or may not be retained.

  • ArraySlice storage is copied on mutation (after copying to an lvalue).

  • Unaffected by the Divide-And-Conquer.

For ArraySlice lvalues today

  • Implementation: Array.subscript(bounds:).get + Array.subscript(bounds:).set:

  • Returned ArraySlice is retained and its storage is copied whenever
    it is mutated (after optimization)

  • Returned ArraySlice may not be retained if it is not mutated
    e.g. array[0..<1].reverse() never mutates the slice

For ArraySlice lvalues with (fixed) DivideAndConquer

  • Implementation: Array.subscript(bounds:)._modify

  • Returned ArraySlice is retained whenever it is mutated (after optimization)

  • Storage is never copied unless the slice is copied.

  • Returned ArraySlice may be retained if it is not mutated
    e.g. array[0..<1].reverse() requires a retain and can't be fully eliminated.

DivideAndConquer trades-off an extra retain in a corner case (divide-and-conquer base case) for eliminating a copy of the storage in the common case (ignoring the other overhead of borrowed status checks).

Explanation of Compile-Time Reference Counting

Axiom 1: At a single program point, any two instances of the same value (identical values) can share a representation.

Theorem 1: for any reference-counted type, checking uniqueness requires apparent mutability of the reference. Otherwise, its reference may share its representation with other instances of the same value.

Commentary: The source of confusion around this stems from the variable scopes that appear evident at the source level. But the presence of those scopes do not subvert the rules above.

While it is true that these two forms may have different semantics in some resepects:

Form 1:

let x = foo()
bar(x)
_ = x

Form 2:
bar(foo())

Those semantics have no impact on whether the implementation can share the representations of foo's returned value, the value in variable x, and bar's argument. Simply put, a single reference will suffice. This is fundamental to the compiler's model of the language.

Axiom 2: A reference passed as an argument (whether owned or inout) may be released no more than once without at least one retain for every additional release.

ARC can be reduced to a counting problem where each operation has a net effect on its operand's reference count.

var obj: AnyObject

retain(obj):  +1
release(obj): -1
apply(obj):   -1..<infinite (possible owned convention)
apply(&obj):  -1..<infinite (reassignment is a release)

The only constraint on the compile-time implementation is proving that an object's reference count does not reach zero before its last use. It's the same exercise required for manual reference counting.

Theorem 2: The only fully general way to check refcount==2 is by simultaneously mutating two distinct references to the same object. For example:

Builtin.isDuallyReferenced(_ object1: inout AnyObject, _ object2: inout AnyObject) -> Bool {
  do {
    _precondition(object1 == object2)
  }
  return swift_isDuallyReference(Builtin.addressof(&object1),
                                 Builtin.addressod(&object2))
}

You may be thinking "can't the compiler have special magic operations".

Yes, the compiler can have all the magic it wants, but that does not change any of these facts. Those magic operations exist within APIs, and those APIs must adhere to the calling conventions of the language. The only way out of the rules I've put forth above would be a language level calling convention that provides for a single reference argument to be net released more than once (without corresponding retains). I'm comfortable stating that we will never have such a convention.

Axiom 3: for any CoW type, mutability implies uniquenes.

Theorem 3: From Theorem 1 and Axiom 3. For any CoW type, mutability and uniqueness are mutually implied. These conditions can be used interchangeably. (I find this useful to simply the discussion).

Let's apply these rules to DivideAndConquer.

The ArraySlice does not have access to the Array when checking isDuallyReferenced(). With access to only one reference, isDuallyReferenced() does not prove anything on its own.

To rely on isDuallyReferenced(), we must guarantee that ArraySlice "owns" a distinct reference to storage. That can be guaranteed if any of these statements are shown true:

  • the ArraySlice is checked for uniqueness (i.e. mutated)--and this
    check cannot be removed.

  • the ArraySlice lifetime is surpasses the Array's lifetime

  • the Array is checked for uniqueness within the ArraySlice's
    lifetime--and this check cannot be removed.

We do know that the Array will be accessed again after ArraySlice's lifetime. That does not help.

We do not otherwise have any control over Array. Therefore, the only option is forcing ArraySlice to be apprently mutated regardless of how much information the compiler has. Note that this is exactly equivalent to forcing the ArraySlice's owner reference to be copied, which is exactly equivalent to forcing a retain. A Builtin.forceRetain(inout AnyObject) would have the same effect.

Conclusion

DivideAndConquer is a working implementation of in-place ArraySlice mutation, given a fairly nonintrusive fix.

isDuallyReferenced(inout AnyObject), as written, should not be exposed as a user-level API.

4 Likes

Hey, thanks so much for writing all this down, and so clearly! Most of it makes complete sense to me and I have no questions or even nits to pick for the most part, but there are a couple, and I got lost towards the end…

I had to google OSSA and found this and this, neither of which is obviously final. Wish I had known about that document at the time, but in fairness I probably would have been too busy to absorb it. Is there an official version somewhere?

Theorem 1: for any reference-counted type, checking uniqueness requires apparent mutability of the reference.

Here I'd like to suggest we use the term apparent possible mutation (or just apparent mutation as you use later) of the reference since mutability is a type system concept. IIUC it's not the mutability of the reference in that sense you're talking about, but the inability for the compiler to prove that it isn't actually mutated, right?

Anyway, in light of how I understand(?) that, I got lost here:

Axiom 3: for any CoW type, mutability implies uniqueness.

It took me a long time to connect this “mutability” to the “apparent mutability” above, in which I hope I'm headed down the right path; before that I didn't have a clue. But even after I rephrase it as “apparent mutation of the reference” I can't make sense of it. It only seems axiomatically true if I read it as “mutation of the buffer referred to by the reference,” which is a very different thing and now I'm pretty sure I'm just guessing. Can you help me out with this one?

The only way out of the rules I've put forth above would be a language level calling convention that provides for a single reference argument to be net released more than once (without corresponding retains). I'm comfortable stating that we will never have such a convention.

Agreed, we won't. But it's not obvious to me that we couldn't come up with a new set of rules, provable to preserve the semantics of existing code code, while making some new things possible. I realize it's speculation on the order of “couldn't we solve this with some [magic] language feature?” but I would like to understand how fundamentally you hold your statement to be true.

Regarding your conclusions, they're pretty encouraging. The excess retain/release on a buffer that is mutable but never actually mutated under rather bizarre circumstances does not seem like it could have much impact on real programs. And, not that I'm counting on it, but this seems like one of those places where—with a different set of rules like what I referred to above—it might be possible for the compiler to prove that the the operations could be eliminated.

It's a little disturbing to me that under the current rules, the fix depends on an opaque, possibly-retaining, possibly-releasing, possibly-mutating, possibly-escaping use of the reference, and very disturbing that isUniquelyReferenced depends on the same opacity. It just seems intuitively like we must be robbing the compiler of more important information than necessary around these operations that are so crucial to performance and are in fact pure, and that it would better to give them some kind of special status in the rules. Am I wrong to be concerned?

Final question: when you wrote,

Let's keep those issues in separate posts.

Did you mean “in separate threads,” or would you be happy to deal with those issues in follow-up posts here?

Thanks again!
-Dave

Good!

but there are a couple, and I got lost towards the end…

Your questions don't sound like you were lost.

I had to google OSSA and found this and this , neither of which is obviously final. Wish I had known about that document at the time, but in fairness I probably would have been too busy to absorb it. Is there an official version somewhere?

If we had an official spec, it would be referred to from the SIL spec... but it's not there yet. The working draft you found is probably the closest thing.

I think I first introduced the term in this post from 2017.

Michael just gave a talk on OSSA at LLVM Dev. I see a link to the video but no slides.

Anyway, this is the same point I made earlier about the compiler failing to remove nested retains and releases to the same reference. It's a problem with the SIL representation that's gradually being fixed but requires migrating the entire optimizer. It's been an ongoing activity for a while. The timeline for a feature like this has absolutely nothing to do with the complexity of the feature and everything to do with the prior amount of technical debt and maintenance burden of existing code.

Theorem 1: for any reference-counted type, checking uniqueness requires apparent mutability of the reference.

Here I'd like to suggest we use the term apparent possible mutation (or just apparent mutation as you use later) of the reference since mut ability is a type system concept. IIUC it's not the mutability of the reference in that sense you're talking about, but the inability for the compiler to prove that it isn't actually mutated , right?

Correct. I meant mutation.

Anyway, in light of how I understand(?) that, I got lost here:

Axiom 3: for any CoW type, mutability implies uniqueness.

It took me a long time to connect this “mutability” to the “apparent mutability” above, in which I hope I'm headed down the right path; before that I didn't have a clue. But even after I rephrase it as “apparent mutation of the reference” I can't make sense of it. It only seems axiomatically true if I read it as “mutation of the buffer referred to by the reference,” which is a very different thing and now I'm pretty sure I'm just guessing. Can you help me out with this one?

Again, I should have said "mutation" here, not mutability. I'm speaking as if the "CoW value" is represented within its buffer. This is the definition of CoW, nothing more.

The only way out of the rules I've put forth above would be a language level calling convention that provides for a single reference argument to be net released more than once (without corresponding retains). I'm comfortable stating that we will never have such a convention.

Agreed, we won't. But it's not obvious to me that we couldn't come up with a new set of rules, provable to preserve the semantics of existing code code, while making some new things possible. I realize it's speculation on the order of “couldn't we solve this with some [magic] language feature?” but I would like to understand how fundamentally you hold your statement to be true.

This is a problem of communicating any magic up through the layers of the API. We sometimes deal with this by adding "@_semantic" tags to APIs. But...key point...even though we can imbue certain source constructs with extra semantics, all the disparate, shady parts of the compiler that don't know about those special semantics still need to function correctly. A bit like adding a new case to a library enum. So, adding a new ownership passing convention requries about the same level of implementation complexity regardless of how its surfaced.

Regarding your conclusions, they're pretty encouraging. The excess retain/release on a buffer that is mutable but never actually mutated under rather bizarre circumstances does not seem like it could have much impact on real programs. And, not that I'm counting on it, but this seems like one of those places where—with a different set of rules like what I referred to above—it might be possible for the compiler to prove that the the operations could be eliminated.

We're talking about different issues. Above, I'm explaining why isDuallyReferenced is not a suitable general purpose API unless it's rewritten to take two @inout arguments. You are right, it would be possible to invent some compiler magic to allow extra retain of your ArraySlice to be optimized. But, all the code that accesses the copied ArraySlice would need to be visible (e.g. fully inlined). Otherwise, we're stuck at the argument convention problem again.

It's a little disturbing to me that under the current rules, the fix depends on an opaque, possibly-retaining, possibly-releasing, possibly-mutating, possibly-escaping use of the reference, and very disturbing that isUniquelyReferenced depends on the same opacity. It just seems intuitively like we must be robbing the compiler of more important information than necessary around these operations that are so crucial to performance and are in fact pure, and that it would better to give them some kind of special status in the rules. Am I wrong to be concerned?

If the lowest-level compiler builtin is exposed, then yes the compiler can recognize it. That is why I did not simply propose adding a unused uniqueness check instead of the _blackHole (or any other as-of-yet unknown runtime call). Some piece of intelligence exists in the compiler that is smart enough to know that an unused uniqueness check can be removed. And a piece of intelligence says they can be hoisted, and combined, and so on, based on complicated rules. That doesn't come close to covering the complexity. The important optimizations happen at a higher level of semantics, "make-mutable-and-unique", which statically guarantees that subsequent checks can be eliminated...

But, again, there are all the other disparate, shady parts of the compiler, including parts that have not yet been written. If that code sees a value being passed to a call, and knows that the call cannot modify the value, then it can share the value's representation. That's fundamental to how values work.

To fix that problem, you would need to change SIL so that uniqueness checking is somehow part of the argument passing convention that every piece of code in the compiler needs to deal with. But, at the source level, whenever you want to write a wrapper around some uniqueness-checking function, you'll still need to take the reference @inout for correctness--otherwise you've just hidden the fact that the reference can't share its representation. So all the complexity of making SIL more "uniqueness aware" would not really change anything.

Of course, another valid approach to giving a value "apparent/potential mutation" is returning the new value, rather than mutating in-place: (newReference, isUniqe) = isUnique(originalReference). That approach has the disadvantage of potential false negatives because the reference may be copied, but I don't think it changes anything for the purpose of this discussion.

A completely different approach to this problem would be give retains stronger semantics at the SIL level. Currently, the compiler only needs to guarantee that objects are alive up to their last use. We could instead say that retains cannot be eliminated merely because they redundantly retain the same reference. To eliminate any retain, the compiler would need to prove that no code can observe the reference count between that retain and some corresponding "paired" release. We don't currently represent "paired" releases, but OSSA will fix that. This would be a poor model for ARC optimization for at least two reasons

  • References, like any value, can be arbitrarily "aliased" across many
    instances. So there's no way to determine whether something happens
    to a referenced object just by looking at the uses of that instance
    of the reference.

  • In general, we don't want to model a critical optimization such that
    it depends on proving a negative condition throughout arbitrarily
    complex code that is otherwise unrelated to the thing being
    optimized.

Final question: when you wrote,

Let's keep those issues in separate posts.
Did you mean “in separate threads,” or would you be happy to deal with those issues in follow-up posts here?

I meant that I wanted to bracket the scope of this post and all of its in-line responses. I feel like there's a threshold of understanding that needs to be reached before in-line responses will converge. It looks like we've reached that point regarding implementation correctness.

1 Like

No video link on that page, but Google found it here.

Yeah, my (mis-)understanding from talking with @Michael_Gottesman was that the ARC part of that project had been completely implemented.

OK, as to where I got confused, let me apply our corrections, a little more precision, and my own emphasis to what you've established:

  • Theorem 1: for any reference-counted type, checking uniqueness requires apparent mutation of the bits of the reference.
  • Axiom 3: for any CoW type, mutation of a part of its value accessed through the reference implies uniqueness.

I'm confused because then you go on to

Theorem 3: From Theorem 1 and Axiom 3. For any CoW type, mutability and uniqueness are mutually implied. These conditions can be used interchangeably. (I find this useful to simply the discussion).

I don't see how this theorem follows from the foregoing. It appears to be relying on the linguistic conflation of the bits of the reference with the value accessed through the reference to establish some kind of if and only if relationship. In fact, the two have an inverse relationship: if the contents of the buffer are going to be mutated, the bits of the reference won't, and vice-versa.

Another reason I'm having trouble seeing the connection is it seems to conflate “apparent mutation” with mutability. If you ignore the supposedly-supporting statements T1 an A3, which are about apparent mutation, and take T3's use of “mutability” as correct, T3 makes total sense. I just don't see how it rests on T1/A3 in any way.

It's not obvious to me that we couldn't come up with a new set of rules, provable to preserve the semantics of existing code code, while making some new things possible. I realize it's speculation on the order of “couldn't we solve this with some [magic] language feature?” but I would like to understand how fundamentally you hold your statement to be true.

This is a problem of communicating any magic up through the layers of the API. We sometimes deal with this by adding "@_semantic" tags to APIs. But...key point...even though we can imbue certain source constructs with extra semantics, all the disparate, shady parts of the compiler that don't know about those special semantics still need to function correctly. A bit like adding a new case to a library enum. So, adding a new ownership passing convention requries about the same level of implementation complexity regardless of how its surfaced.

OK, so this is “merely” a matter of implementation complexity and technical debt, rather than fundamental.

Regarding your conclusions, they're pretty encouraging. The excess retain/release on a buffer that is mutable but never actually mutated under rather bizarre circumstances does not seem like it could have much impact on real programs. And, not that I'm counting on it, but this seems like one of those places where—with a different set of rules like what I referred to above—it might be possible for the compiler to prove that the the operations could be eliminated.

We're talking about different issues. Above, I'm explaining why isDuallyReferenced is not a suitable general purpose API unless it's rewritten to take two @inout arguments.

I'm not sure what you mean by “we're talking about different issues.” I was referring to the parts of your conclusions where you say that DivideAndConquer can be made correct under the existing SIL rules with only a small change, and where you analyze what would happen with existing and recompiled binaries if we were to ship it as a change to the Array implementation. I do understand what you said about isDuallyReferenced, but “we're talking about different issues” seems to imply you think I was confused about something or conflating some things, and I don't see it.

You are right, it would be possible to invent some compiler magic to allow extra retain of your ArraySlice to be optimized. But, all the code that accesses the copied ArraySlice would need to be visible (e.g. fully inlined). Otherwise, we're stuck at the argument convention problem again.

Yes, since the retain is only extra in this corner case when there's no actual mutation, optimizing it out requires that the compiler can see that there's no actual mutation. I get that.

It's a little disturbing to me that under the current rules, the fix depends on an opaque, possibly-retaining, possibly-releasing, possibly-mutating, possibly-escaping use of the reference, and very disturbing that isUniquelyReferenced depends on the same opacity. It just seems intuitively like we must be robbing the compiler of more important information than necessary around these operations that are so crucial to performance and are in fact pure, and that it would better to give them some kind of special status in the rules. Am I wrong to be concerned?

[snip discussion of implementation complexity]

…At the source level, whenever you want to write a wrapper around some uniqueness-checking function, you'll still need to take the reference @inout for correctness--otherwise you've just hidden the fact that the reference can't share its representation. So all the complexity of making SIL more "uniqueness aware" would not really change anything.

I ask that we set implementation complexity aside for the moment because IMO we currently have a fairly weak foundation for interesting value semantics, and it seems as though incremental steps like building “uniqueness awareness” on that weak foundation could be far more complex than establishing a model that accounts for deeper semantic relationships.

I'm thinking “CoW and purity-awareness” is where we really want to be. For example, colleagues of mine were wondering why accessing a local let-bound Array's count in a closure results in the whole Array being captured. The fundamental reason, IIUC, is that the count is stored in the Array's buffer, which could be shared, and nothing tells the compiler that it acts just as though it were an independent value stored in the Array's shallow bits. Until those truths are encoded into SIL at a fundamental level, it seems inevitable that we'll have to chase all the “disparate, shady parts of the compiler that don't know about those special semantics.”

Of course, another valid approach to giving a value "apparent/potential mutation" is returning the new value, rather than mutating in-place: (newReference, isUniqe) = isUnique(originalReference) . That approach has the disadvantage of potential false negatives because the reference may be copied, but I don't think it changes anything for the purpose of this discussion.

Of course. And the potential for reference copying could be largely eliminated by annotations, but I agree it doesn't change anything. If the check could be written this way, where _isUnique was known by the compiler to be pure, it might change things a little (though I think it's awful):

let isUnique = withExtendedLifetime(oldReference) {
  let (newReference, r) = _isUnique(ObjectIdentifier(oldReference))
  oldReference = newReference
  return r
}

A completely different approach to this problem would be give retains stronger semantics at the SIL level. Currently, the compiler only needs to guarantee that objects are alive up to their last use. We could instead say that retains cannot be eliminated merely because they redundantly retain the same reference. To eliminate any retain, the compiler would need to prove that no code can observe the reference count between that retain and some corresponding "paired" release.

[snip why it wouldn't work]

Yeah, that seems way too low-level again.

Warning: compiler optimization amateur hour starts now
What about this: if we encode into SIL the idea of (independent) values that include bits accessible through references (i.e. CoW types), then you can say that retains can be eliminated for lifetime-dominated copies of a value when the compiler can prove the copies aren't potentially mutated. Among other things, that could allow the knowledge of immutability of lets and function parameters to be leveraged at the SILGen level, before optimization even starts.

1 Like

I want to clarify up front, I don't think I've ruled out the possibility of making the compiler more aware of CoW semantics. The only thing I wanted to absolutely rule out was exposing isDuallyReferenced as an independent API.

It's more that the new ARC work needs to be fully integrated. A whole lot of inlining and other optimizations need to take place before we see new ARC optimization handling the kind of complexity of the Array implementation... and those other optimizations currently destroy OSSA form.

I don't see how this theorem follows from the foregoing. It appears to be relying on the linguistic conflation of the bits of the reference with the value accessed through the reference to establish some kind of if and only if relationship. In fact, the two have an inverse relationship: if the contents of the buffer are going to be mutated, the bits of the reference won't, and vice-versa.

I'm mainly pointing out that checking uniqueness and buffer mutability always look the same at a regular call site.

It is true that apparent mutation of the reference bits does not need to imply mutation of the storage. I'm saying that implication is present without any special information at a call site.

The point that I keep trying to make unsuccessfully is that we can't transitively apply additional semantics upward when those semantics place additional requirements on the caller:

@_semantics("I don't really mutate any bits!")
func checkUniqueness(a: @inout CoWContainer) -> Bool

func abstraction(a: @inout CoWContainer) -> Bool {
  return checkUniqueness(&a)
}

At the point of calling abstraction, the compiler cannot see implementation and cannot distinguish between apparent mutation of the reference bits vs. mutation of the storage.

And the inverse, pretending we had a "move":

@_semantics("I observe a reference count!")
func checkUniqueness(a: AnyObject) -> Bool 

func abstraction(a: AnyObject) -> Bool {
  return checkUniqueness(move(a))
}

At the point of calling abstraction, the compiler cannot see implementation and cannot know that a's representation can't be shared with another reference.

Another reason I'm having trouble seeing the connection is it seems to conflate “apparent mutation” with mutability. If you ignore the supposedly-supporting statements T1 an A3, which are about apparent mutation, and take T3's use of “mutability” as correct, T3 makes total sense. I just don't see how it rests on T1/A3 in any way.

I hope I answered this above. I understand that your mutability has to do with whether we see a local variable or class property declared 'let'. Within the compiler's implementation of CoW, I think of "mutability" as the state of the CoW storage after uniqueness checking and buffer copying, which only happens during the process of mutation. But I'll try not to mix those up anymore.

This is a problem of communicating any magic up through the layers of the API. We sometimes deal with this by adding "@_semantic" tags to APIs. But...key point...even though we can imbue certain source constructs with extra semantics, all the disparate, shady parts of the compiler that don't know about those special semantics still need to function correctly. A bit like adding a new case to a library enum. So, adding a new ownership passing convention requries about the same level of implementation complexity regardless of how its surfaced.

OK, so this is “merely” a matter of implementation complexity and technical debt, rather than fundamental.

Well, it's also a matter of ABI. And, for the record, I can't think of anything I would change anything about our argument conventions starting over. I think what we're just dancing around here is that we need the type system to communicate CoW semantics to the compiler. If the compiler knows that the value being passed is or is not "CoWish" it could make extra assumptions about what happens on the other side.

I'm thinking “CoW and purity-awareness” is where we really want to be. For example, colleagues of mine were wondering why accessing a local let-bound Array's count in a closure results in the whole Array being captured. The fundamental reason, IIUC, is that the count is stored in the Array's buffer, which could be shared, and nothing tells the compiler that it acts just as though it were an independent value stored in the Array's shallow bits. Until those truths are encoded into SIL at a fundamental level, it seems inevitable that we'll have to chase all the “disparate, shady parts of the compiler that don't know about those special semantics.”

Interesting problem. It would be possible with the right builtins and optimizer magic to make this problem go away in some cases. But I typically fall on the side of designing for predictable behavior. Whenever CoW behavior is at stake, we should look for solutions that pop out of the type system, and don't involve whole program analysis. This what @Ben_Cohen meant by a mandatory (-Onone) solution.

Of course. And the potential for reference copying could be largely eliminated by annotations, but I agree it doesn't change anything. If the check could be written this way, where _isUnique was known by the compiler to be pure, it might change things a little (though I think it's awful):

let isUnique = withExtendedLifetime(oldReference) {
let (newReference, r) = _isUnique(ObjectIdentifier(oldReference))
oldReference = newReference
return r
}

Once @Erik_Eckstein fixes this, the optimizer will convert checks to this form behind-the-scenes whenever possible, where the uniqueness check will look like a pure function, even to all the disparate shady compiler parts.

What about this: if we encode into SIL the idea of (independent) values that include bits accessible through references (i.e. CoW types), then you can say that retains can be eliminated for lifetime-dominated copies of a value when the compiler can prove the copies aren't potentially mutated. Among other things, that could allow the knowledge of immutability of lets and function parameters to be leveraged at the SILGen level, before optimization even starts.

I think we should surface more CoW information to enable additional "mandatory" optimizations. I only push back when it looks like we need to change semantics in situations where the compiler doesn't have that CoW information.

I can think of two approaches to add more CoW awareness in SIL.

#1 New compiler builtins within the CoW container implementations and @semantic attributes

Here I get hung up on the fact the argument conventions limit any semantics that we want to apply transitively across API levels.

When passing references, the type system and argument conventions alone don't convey what can happen to the referent... whether it's refcount is observed or its storage mutated.

We have a similar problem with purity awareness. It would have been nice if some level of function purity was the default language semantic. Instead we rely very heavily on @inlinable attributes and whole-module analysis.

We can still do well with this approach at -O with careful attention to inlining. Using new builtins, @Erik_Eckstein is designing intelligence that allows the optimizer to know that CoW buffers are effectively immutable, when not being locally mutated. It should work as long as the CoW implementation is sufficiently inlined.

But... when CoW storage copies are at stake, we should design for predictability and guaranteed, -Onone behavior. This approach isn't good enough for mandatory optimizaton.

#2 Surface information about CoW data types in the type system

I do think surfacing type information is promising for mandatory optimization as long as it doesn't depend on changing something fundamental about values and argument conventions.

Whenever it has full type information (no properties hidden behind libraries) it would be possible for the compiler to understand CoW semantics at any level of the API. But, we have generic types and library types. So, I get hung up for the same reason.

In both approaches we need to accept that the compiler doesn't always know it's dealing with a CoW type.

We will need to define new argument passing conventions and special value semantics for move-only types. The move-only atomics case is similar in that we'll have something that appears immutable in its declaration, but is pinned to an address, and can apparently mutate at SIL level. But that all hinges on making any containing type conform to a special protocol... and that won't happen for CoW types.

So, I imagine there are lots of things we can improve with CoW semantics. All I know is that relying on a "dual-reference" check requires apparent mutation of the ArraySlice's reference, and that apparent mutation could probably be optimized away with compiler magic when we see there is no actual mutation.

1 Like

Since you both begin and end with that point, I guess I need to clarify up front: I have absolutely no interest in exposing isDuallyReferenced as a public API, or even in using it internally. The near as I can tell, the whole system can be built using just the extra bit in the ABI. isDuallyReferenced just happens to be what makes things simple with the current state of ARC, and will be a suboptimal way to handle it when ARC gets smarter. Ideally, we use the extra bit to denote that we can't deallocate the buffer or write outside the slice even when it's uniquely referenced.

A whole lot of inlining and other optimizations need to take place before we see new ARC optimization handling the kind of complexity of the Array implementation... and those other optimizations currently destroy OSSA form.

I'll take your word for it for now. My (fallible) instinct tells me that if we make the information in SIL high-level enough, the only needless CoWs left to eliminate with inlining are the rare cases where the guaranteed calling convention needlessly holds a reference across the mutation of a copy of a function parameter.

I'm mainly pointing out that checking uniqueness and buffer mutability always look the same at a regular call site.

Sorry, I'm still having trouble pinning down what you mean here. There is no check for buffer mutability other than uniqueness, so of course checking for one looks like checking for the other. That seems too obvious to be what you are saying, so I must be missing something.

Of course it can't. The semantic annotation isn't present in the signature of abstraction, so the distinction is lost. The same goes for passing information downward in the call chain. It should be clear that I understand these points if you've been reading my arguments about why fixing this problem means encoding some information dynamically. Or are you making some other point that I'm missing?

Another reason I'm having trouble seeing the connection is it seems to conflate “apparent mutation” with mutability. If you ignore the supposedly-supporting statements T1 an A3, which are about apparent mutation, and take T3's use of “mutability” as correct, T3 makes total sense. I just don't see how it rests on T1/A3 in any way.

I hope I answered this above.

I'm afraid not, but maybe it's not important to see how you get to T3. If all it means is that “the buffer is eligible for mutation if and only if it is unique” then I'm happy to take that as an axiom.

I understand that your mutability has to do with whether we see a local variable or class property declared 'let'. Within the compiler's implementation of CoW, I think of "mutability" as the state of the CoW storage after uniqueness checking and buffer copying, which only happens during the process of mutation. But I'll try not to mix those up anymore.

It's not “my mutability,” it's the langage's mutability. When I used the name “makeMutableAndUnique” in the standard library I was writing more casually and I had the benefit of comments to explain what I meant. For this discussion, we need clear terminology, and the language has already defined “mutable” so let's come up with words for what you mean. Best I can do is above, “eligible for mutation,” or maybe “dynamically mutable.” Any other ideas?

I think what we're just dancing around here is that we need the type system to communicate CoW semantics to the compiler.

Yes please; it's only what I've been asking for since 2013 :wink:.

Then we're in violent agreement; I've been saying we should represent value semantics in the type system for years. I know what Ben meant and I always agreed we should aim for that.

Once @Erik_Eckstein fixes this, the optimizer will convert checks to this form behind-the-scenes whenever possible, where the uniqueness check will look like a pure function, even to all the disparate shady compiler parts.

:+1:

I think we should surface more CoW information to enable additional "mandatory" optimizations. I only push back when it looks like we need to change semantics in situations where the compiler doesn't have that CoW information.

I'm afraid I don't know what you mean by “change semantics”

I can think of two approaches to add more CoW awareness in SIL.

#1 New compiler builtins within the CoW container implementations and @semantic attributes

To me, it seems obviously too low level again.

#2 Surface information about CoW data types in the type system

I do think surfacing type information is promising for mandatory optimization as long as it doesn't depend on changing something fundamental about values and argument conventions.

Whenever it has full type information (no properties hidden behind libraries) it would be possible for the compiler to understand CoW semantics at any level of the API. But, we have generic types and library types. So, I get hung up for the same reason.

Sorry to pick a nit, but it's not “CoW semantics,” it's “value semantics.” As @Joe_Groff has made clear, that's just an informal way to talk about purity. I hope we haven't locked ourselves into an ABI corner where we have to write off the ability to represent purity; if we have, I'll need to start thinking about the next version of Swift's ABI and how we're going to get there. But it seems to me that if we can retrofit move-only types, we ought be able retrofit purity.

In both approaches we need to accept that the compiler doesn't always know it's dealing with a CoW type.
We will need to define new argument passing conventions and special value semantics for move-only types. The move-only atomics case is similar in that we'll have something that appears immutable in its declaration, but is pinned to an address, and can apparently mutate at SIL level. But that all hinges on making any containing type conform to a special protocol... and that won't happen for CoW types.

…because?

If we're talking ABI changes, one thing that would be of general benefit for value types would be a "copy on escape" calling conventions, where functions that receive value-type arguments would presume them to be allocated on the stack and be required to implicitly copy them to the heap if they want to extend the lifetime of the value for any reason (which would be the equivalent of a retain for a buffer that is already on the heap), similar to how blocks in Objective-C must be _Block_copy-ed if they are escaped. This would allow escape analysis and stack promotion to be much more aggressive. However, it might also be something we could apply to the in-place-slicing problem—if we treat the slice value yielded to a mutation as being copy-on-escape, and we consider operations that would force a realloc of the buffer to be escapes, then that seems like it would allow the original slice value to be a plain in-place borrow of the original buffer, since any attempt to escape or grow the slice would trigger a copy as well.

4 Likes

How is this different from what the change from owned to guaranteed calling convention accomplished?

The owned-to-guaranteed convention only affects the inline storage of the type. If we pass an Array value as a guaranteed argument, then the callee has to copy the reference to escape it, but it doesn't currently need to also trigger a copy of the buffer the reference points at. The parallel you raise is insightful, though; copy-on-escape would be in a sense extending the +0/+1 ownership convention from the inline storage transitively to its COWed buffers.

1 Like

I guess I'm still confused. If the value escapes, it gets copied. If it contains a reference, the copy implies a retain. Then the buffer isn't uniquely-referenced anymore and can't be deallocated or written on. Why would we resort to eagerly copying the contents of the buffer?

1 Like

In the case of non-mutating arguments, it's a matter of optimization tradeoffs. It's likely that most value-semantics don't escape their arguments in the common case, so it would be nice to be able to stack-allocate buffers in more cases. Forcing the few functions that do make escaping copies of their arguments to copy the buffer would allow other code to perform better. I haven't thought as much about the tradeoffs when it comes to in-place mutating slices, but it's interesting that a similar technique might be able to make in-place slicing work without changing the surface language model.

1 Like