Solving the mutating slice CoW problem

Michael_Gottesman · April 15, 2020, 6:52pm

After reading the proposal, I am very against it.

The reason why is that this is advocating for a solution that bakes into the ABI magic bits to create semantics due to missing language features. Before ABI was fixed, I was ok with this sort of thing (since we could remove these sorts of things before ABI is finalized as we did with the pinning bit). Now I am very against embedding anything like this into the ABI UNLESS, the proposer can prove that there aren't additional language features that would make this unnecessary.

For instance, Dave, what if we were in a world where we had move only types, struct destructors, and inout locals? Would that change the solution here? Or if one had a magic wand for language features that would not conflict with the current language, what would you need to do this?

dabrahams · April 18, 2020, 7:33pm

Well, I couldn't blame you for being against it on the basis that it isn't quite right, which I think @Andrew_Trick has convinced me of (even if there's a lot of what he's said I'm still waiting for clarification on). Correctness might be salvageable, or not.

I do, however, object to framing this in terms of a requirement to prove something that seems obviously unprovable. There are an infinite number of imaginable language features that hypothetically solve this problem… especially when you're going to allow for “magic wands!” The problem here is that nobody knows how make such wands to produce the result we need, much less build the wands themselves. It would be much fairer for me to demand that you prove that that the proposed ABI changes are in fact “due to missing language features” by providing a single example specification and implementation that solves the problem without ABI changes.

I think there is a more productive way to look at it:

If a human like you or I can look at a piece of code and figure out what the compiler/library should be doing with reference counts, copies, and mutation, there should be a way to encode that understanding in language semantics and features with clearly specified meaning.

I agree with that, and that it would make for a better solution, which is why I'm excited to pursue this part of the conversation:

Ideally, this process would lead to an answer without ABI changes, though I think it's conceivable we'd to arrive at the end of the process having demonstrated that implementing the rules we came up with requires such changes. At that point I think it would be reasonable to demand a disproof by counterexample from anyone objecting to the ABI change.

P.S. I also don't know what you mean by “inout locals” and have only a vague idea of what you mean by “struct destructors” (which—at least as I understand them—don't change the picture one whit).

Michael_Gottesman · April 18, 2020, 11:31pm

@dabrahams Sorry if I was unclear, but I wasn't suggesting infinite possibilities. In fact, I suggested a few possible techniques explicitly: move only types, struct destructors (provided by move only types), and the ability to create a binding that in a sense is like an inout argument but over a region of code instead of over a function.

dabrahams · April 19, 2020, 10:16pm

But Michael, it's irrelevant what you were suggesting. Meeting your demand to “prove that there aren't additional language features that would make this unnecessary” means proving something about an infinite set of unknown possible language features. And please don't say I'm just taking you too literally; solving this problem is going to require precise communication. If that's not your actual condition for accepting burning something into the ABI, what is it, exactly?

Even just limited to your suggested features, I think the condition is pretty unreasonable—how does one prove that a given tool couldn't be used to solve some problem? But I'll try anyway…

The problem under discussion is how to make the use of existing APIs (roughly, subscripts and swapAt), more efficient. These APIs have to work in existing code on types that don't use any new features, so the only way new features could be used to solve these problems is inside the implementation of the APIs concerned.
Move-only types don't allow new semantics to be implemented, they just allow us to enforce certain semantics with the type system. It's easy to see by doing a mental code transformation: just make the hypothetical move-only type copyable, take the code that would be in its destructor and instead call it explicitly in the right places. Similarly, struct destructors don't add an ability to create new semantics, they just allow certain things to be done implicitly rather than explicitly. But we don't care how explicit the standard library's implementation needs to be in order to solve this problem.
I still don't know enough about your "inout locals" idea to be sure, but in the only reasonable implementation of such a feature that I can imagine, they don't add any new semantics either: any “region of code” in which you could create an inout local could instead have been coded as a call to a helper function with a regular inout argument.

Most of all, I don't think any of these features get at what I see as the fundamental issue:

When an array has been inout-sliced:

we want to allow the slice buffer to be mutated in-place without copying, but
can't allow the buffer to be freed if the slice is assigned or resized.

Today, 1. requires uniqueness and 2. requires non-uniqueness at the point of mutation, which at least proves to me that uniqueness is insufficient information to use as the sole criterion for making decisions about whether a mutation can be done in-place. That's why my solution adds a bit.

I actually think I know how to solve the problem without relying on the ability to observe refcounts > 2, which I think is at the core of @Andrew_Trick's objection. But it still needs a bit to store the information that an array buffer is being inout-sliced. ¹ As of today I don't see a way around adding dynamic information, because the ultimate mutation being applied to the slice may not be visible to the compiler so the extra information can't reach the site of the mutation statically. That's about as close as I can come to a proof that ABI changes are needed, so maybe I did it after all. Q.E.D.?

¹ I suspect it will be hard to demonstrate because ARC isn't making all the optimizations I think Andy told me it should (“if it's doing its job”) but I'll try to code it. [Update] yeah, it's as I suspected: the compiler eagerly issues a retain for the slice, and no optimizations we have today seem to be able to move that retain even as far as the yield.

Alexandre_Lopoukhine · April 22, 2020, 2:07am

I'd just like to push back a tiny bit on making this specific to slicing Array, as I'd love to be able to do similar transformations on slices of things like matrices that may involve strides and being non-contiguous.

dabrahams · April 22, 2020, 2:21am

Oh, absolutely. That's the gist of my fourth consideration:

This technique takes a bit too much hackery today to be broadly implementable by users, and AFAIK it can't be implemented just once by the library. I would love to find a more general solution.

The point of demonstrating it on arrays is to prove that it can be done in a way that's practically useful, not to limit it. Ideally, if we solve the problems, the solutions can be packaged into easily-reused library components.

ben-cohen · April 22, 2020, 3:32am

As it stands though, I don't see a path for that with this technique even if we make it more user-facing, since it is based around knowledge of the buffer.

To me, the ultimate success criteria to consider this problem solvable is "does it compose"? That is, if I build a new MutableCollection type based on top of Array (or ultimately a generic Collection), could that new collection automatically acquire non-CoWing slices?

For comparison, this is how the (as yet only pitched, though implemented) design of modify works. If you implement your MutableCollection in terms of another Collection, using it as storage, and use modify instead of set for your subscript (a fairly straightforward user ask), then your custom type gains the same capability as Array wrt avoiding CoW with element mutation when backed by a collection that supports it.

As it stands, this won't be the case for non-CoW slicing. By default Slice takes a full copy of the parent, which will copy the inner array. We would need a design for Slice that accommodated this new technique, generically, on top of any other arbitrary Collection. I don't think this can be done by library changes + new runtime functions alone. I think it needs additional compiler features (as modify did), though what I still don't know.

Once we have a path here there are still possibly-insurmountable challenges:

this probably can't be done in an ABI stable way, so may be completely closed off to us
we also would need to be confident that this didn't require such additional complexity in the optimizer that it had negative knock on effects, and also would need to be robust i.e. can't fall off a cliff if, say, called from unspecialized/non-inlinable code.

I would dearly love this problem to be solved – but I think we have to be clear-eyed about the chances. At some point if we cannot solve it, we need to pivot to accepting the approach of passing around ranges explicitly and start adopting this approach widely. While this would be less elegant, and add some unfortunate foot guns (that maybe we could flag to users with static analysis), it may be better to go down that road clearly and sooner rather than continuing to hope in vain indefinitely. I'm not sure at what point we draw that line tho.

dabrahams · April 22, 2020, 5:24am

Technically it's based on being able to observe a bit associated with the buffer instance. That could be in a side table or it could be folded into the refcount header, which still has not been baked into the ABI. That's where the old "pinned" bit was; it could be brought back.

Yes, that's the ideal place to end up, for sure, and is part of what I meant by finding “something more general.” Array is a first step, and a platform for understanding the problem, a journey I think we've only just started. We certainly can't get to the ideal composable answer without completing that journey.

That's…one way to put it I guess, but it blurs some distinctions, IMO making the situation sound much more hopeless than it actually is. Being more precise and filling in some details:

Slice contains a copy of the thing it was sliced out of, which (AFAICT according to Andrew because the compiler isn't “doing its job”) means the slice claims a second reference to the buffer, which means that the first time the slice is mutated, the buffer will be copied.

I'm not at all certain that doesn't fall out of solving the problem for the underlying collection, with no API changes to Slice. That scenario is certainly consistent with how I currently understand the problem (which might still be wrong of course).

I don't think this can be done by library changes + new runtime functions alone. I think it needs additional compiler features (as modify did), though what I still don't know.

this probably can't be done in an ABI stable way, so may be completely closed off to us

I don't know why you think these things; IMO it's just much too early to judge what's going to be needed, because we still don't have a solid understanding of the problem. One step at a time.

we also would need to be confident that this didn't require such additional complexity in the optimizer that it had negative knock on effects, and also would need to be robust i.e. can't fall off a cliff if, say, called from unspecialized/non-inlinable code.

There's a lot of room for debate about how much performance degradation is acceptable in the presence of such optimization barriers. There are plenty of places today where going from -Onone to -O or non-inlinable to inlinable makes the difference in whether a CoW occurs. That isn't a good reason to penalize code that could be optimized.

I would dearly love this problem to be solved – but I think we have to be clear-eyed about the chances. At some point if we cannot solve it, we need to pivot to accepting the approach of passing around ranges explicitly and start adopting this approach widely. While this would be less elegant, and add some unfortunate foot guns (that maybe we could flag to users with static analysis), it may be better to go down that road clearly and sooner rather than continuing to hope in vain indefinitely.

I agree!

I'm not sure at what point we draw that line tho.

Me neither, but IMO that point is not now, when we are only just getting a bead on the nature of the problem. Unless there's been a massive effort to solve it that I don't know about in the past four months, we've barely tried to work on it at all, as a group.

ben-cohen · April 22, 2020, 5:20pm

This is not a debatable sliding scale. It needs to be a feature that is guaranteed even at -Onone, via language mechanics and mandatory optimization passes, for it to be considered solved and act as a basis for how to design collection APIs in the standard library and beyond. It is not something that can just be an optimization possibility in -O that may or may not happen (even if it happens "most of the time"), or something that only works on non-resilient types with fully inlinable implementations.

Again, how modify works should be the guide here. Elements are yielded at +0, and this is guaranteed by the language and happens even in -Onone.

That's not to say we can't experiment, and even land improvements incrementally as a nice-to-have property of certain types (though we need to be careful to explain that it doesn't generalize, similar in nature to indices being a performance trap when moving to generic code). But there's a clear line at which point we can use it to design APIs, and that point is that there is a clear guaranteed model for when divide-and-conquer slice mutation will not trigger CoW, ever, if you follow a simple pattern when writing a collection, including building on top of other collections generically.

(Additionally, since the "win" of implementing it only for specific types is so much lower than solving the problem in full, the bar for avoiding complexity in the optimizer is higher, and the lengths to which we might go to work around the ABI problems diminish)

dabrahams · April 22, 2020, 5:59pm

Isn't it a little presumptuous to unilaterally declare anything non-debatable? Regardless, I obviously disagree. There is a huge potential community of users for Swift that do not care at all about resilience or non-inlinable implementations, and if we fail to serve them well, Swift will not grow significantly beyond Apple platforms. Speaking for my project, which falls squarely into the domain of HPC, there are many places we'd be willing to accept some performance cliffs in -Onone if it allows us to ship a cleaner, more elegant API that performs optimally at -O. In our case, we're competing with Python for code cleanliness, so the bar is extremely high. Another way to look at it: if we had to worry about all of the potential performance cliffs that exist in Swift today, my project would have to give up on Swift, because we'd never be able to deliver sufficiently high-level abstractions.

Whether the standard library has to start exposing inelegant APIs is a separate issue, but given that all the important parts of the standard library are actually non-resilient, it's not at all obvious to me, especially since the extent of ARC optimizations performed at -Onone is completely fungible.

So, while I agree that an ideal solution has all the properties you cite, it's very debatable how far we need to get to produce substantial value.

gribozavr · April 24, 2020, 11:23am

Going back to the question about missing language features.

Could the issue be that we want Slice to sometimes represent a separate independent value, and sometimes a mutable borrow that passes writes through?

@Michael_Gottesman mentioned local inout variables. What about stored inout properties? I think a stored inout property could capture the notion of a mutable borrow of the original collection. Therefore, we could implement a slice that passes mutations through to the original collection and is capable of being fully reassigned as:

enum MutableSlice<Base: Collection> {
  case Original(base: inout Base)
  case Replaced(base: Base)
}

I don't yet have a good idea of how to connect it to existing subscript that returns a Slice, which definitely represents a separate value today. We could add another subscript (distinguished by an argument label) to Collection that returned this MutableSlice.

Of course, in order for an instance of this MutableSlice to be safely used, we should enforce the exclusivity rule. IDK if it would be possible without significantly increasing the complexity of the planned ownership model.

Erik_Eckstein · April 24, 2020, 12:58pm

I'm against the implementation in the runtime because it adds much complexity. I can remember how much a relief it was when we eventually could remove pinning. It would be really unfortunate if we would re-add something like that.

I like @gribozavr's idea. I'm not sure why "fully reassign" is actually needed (the compiler could prevent it). It would be awesome if a slice could be represented very cleanly as

struct MutableSlice<Base: Collection> {
  let base: inout Base
  let start: Base.Index
  let end: Base.Index
}

But I didn't think that through and maybe I'm missing something.

ben-cohen · April 24, 2020, 2:33pm

To be able to mutate the base via the slice like this, you need to handle two things: the wipeout of the slice, and the possible escape of the slice. In both cases, I think it's a question of detecting it has happened and incrementing the reference counts in the base at that point to protect the it (in the case of wipeout: not leading to releasing the base when it should continue to exist; and in the case of escape, preserving the value semantics of the base). If you do that, I don't think you need to worry about preventing reassignment to the slice (which I don't think is possible – it's pretty baked into swift that you can always do that).

dabrahams · April 24, 2020, 10:45pm

Hi Dmitri!

While that's true, I don't think it captures the reasons for our problem. I think what you just described is a characteristic of anything that can be written through a property or subscript. For example, here's a String property doing that:

struct X { var s = "abcde" }
var x: X

// `String` acts as independent value
var t = x.s
t.removeLast() // doesn't affect x

// `String` acts as “mutable borrow” that passes writes through.
x.s.removeLast()

Making it performant with CoW

When the target of the write is implemented with CoW, getting good performance depends on the property/subscript access not prematurely claiming an additional reference. When the target is already stored in memory somewhere, the compiler happens to oblige us by not claiming that reference, provided the accessor is either:

yielding the target's address from a _modify, or
is the accessor synthesized by the compiler for regular stored properties

The issue for slicing…

…is two^H^Hhreefold: [EDITED]

When you slice an array, the Slice instance isn't already in memory anywhere, so it has to be constructed on the stack. At least today, that construction eagerly claims an additional reference to the buffer.
If we could avoid claiming a reference for the yielded slice, when a slice is reshaped:
```
 a[3...7].removeLast()
```
there is code that will deallocate elements belonging to the array if it finds the slice to be uniquely-referenced. I'm not sure how much of that code is compiled into clients already, but that could be a problem if the only test is uniqueness.
If you could avoid claiming a reference for the yielded slice, you'd still need to prevent the array buffer from being deallocated in situations like this
```
 extension ArraySlice { 
   mutating func reassign() { self = ArraySlice() } 
 }
 a[3...7].reassign()
```

dabrahams · April 24, 2020, 10:57pm

Years ago, before we ever got pinning as it was eventually formulated, @Joe_Groff and I designed a similar system that solved the problems of this thread, but that design wasn't adopted. I suggest to you that while it may have been a relief to you to see the bit gone, the bit (or something equivalent) was always needed to produce the semantics we want.

Erik_Eckstein:

I like @gribozavr's idea. I'm not sure why "fully reassign" is actually needed (the compiler could prevent it). It would be awesome if a slice could be represented very cleanly as
struct MutableSlice<Base: Collection> {
  let base: inout Base
  let start: Base.Index
  let end: Base.Index
}

If you want to solve the problem without an extra bit you need to deal with all three parts of the issue for slicing. Solving the first part (the extra reference) would mean a way for the slicing subscript to initialize and destroy that inout Base without retaining it. However, I don't know how anyone can solve the other two parts of the problem, especially the third part, if there's only a single reference.

But I didn't think that through and maybe I'm missing something.

I mean, you're not missing anything in saying “it would be awesome.” The problem is making that feature do what's needed. As far as I can tell, I've proven that you can't solve the problem using only uniqueness tests and no additional bits.

ben-cohen · April 24, 2020, 11:21pm

Yes, the idea is this is a new representation that would take a "borrowed" copy of the Base. That borrow would mean a +0 copy (solving your issue #1), and would need to be scoped to reserve exclusive access of the Base. It would somehow need to detect the borrowed reference dropping to zero and not free it, but instead just return it (solving issue #3). For #2 – after the yield the Base needs code to examine the slice and reconcile, much like that code you linked to does. Generalizing the fast path, where it detects nothing needs doing because it was mutated in place, is hard though for a generic Slice implementation.

dabrahams · April 24, 2020, 11:58pm

That's not what the code I linked to is doing. As a byproduct of detecting uniqueness in the slice it is releasing references to array elements outside the slice's bounds.

Anyhow, this still doesn't explain how the proposal can work. You need to be able to store a copy of this slice somewhere in the escaping case, and that copy needs to be +1, and needs to release its reference when destroyed. I'm not sure what inout is actually supposed to mean in this context, but if it means “don't retain another reference,” that's not intrinsic to the base property of all instances; it's intrinsic to the context (slicing subscript) where this one instance was created¹ — the slice that escapes does need to retain another reference, and in every other way needs to act like something that doesn't have inout on its base.

As I've been saying, when this thing crosses a visibility boundary in the compiler, you need some way to know how to handle it when it's being mutated. Imagine f is a mutating method, opaque to the compiler.

var a = [ ... ]
a[3...7].f()      // 1
var b = a[3...7]
a = []
b.f()             // 2

In both cases 1 and 2, f should observe that the buffer is uniquely referenced. If f happens to reassign self, in case 1 it must not release the buffer but in case 2 it must release the buffer. Something needs to communicate the need for that different behavior into the body of f. The information is not present in the type system, so it has to be communicated dynamically.

The only other possibility is that f (dynamically) communicates the potential release back up to the call site, where it's known whether we're in an inout slicing context.

Either way, there's dynamic information needed.

¹ That's why Erik's proposed notation doesn't make sense to me.

Andrew_Trick · April 25, 2020, 2:37am

Here is the explanation I promised. This post only relates to the soundness of the implementation in GitHub - dabrahams/DivideAndConquer: Demonstrates a fix for the mutating divide-and-conquer COW problem, which I'll refer to below as "DivideAndConquer". Responding in-line point-by-point won't converge in a useful place. Instead I wanted to deal with this specific aspect coherently in one place. This post has no bearing on other important issues, notably:

a path to generalizing the design to other collections
guaranteeing non-copying behavior at -Onone
incurring Array implementation complexity--the initial
implementation costs are insiginificant relative to the long-term
costs

Let's keep those issues in separate posts.

Fix for Divide-And-Conquer

+++ b/Sources/DivideAndConquer/ContiguousArray.swift

   public subscript(bounds: Range<Int>) -> ArraySlice<Element> {
     get {
       _checkIndex(bounds.lowerBound)
       _checkIndex(bounds.upperBound)
       return ArraySlice(_buffer: _buffer[bounds])
     }
     _modify {
       _checkIndex(bounds.lowerBound)
       _checkIndex(bounds.upperBound)

       let wasUnique = _buffer.isUniquelyReferenced()
       var rhs = ArraySlice(_buffer: _buffer[bounds])
+      // Any runtime function that the compiler can't make any
+      // assumptions about and that takes the address of 'rhs' will
+      // work here.
+      strongRefCount(&rhs)

Test Case Requiring Fix

@inline(never)
func testEscapeWithTripleReference(array: inout ContiguousArray<Int>) -> ArraySlice<Int> {
1:  return copyAndMutate(slice: &array[0..<4])
}

@inline(__always)
func copyAndMutate(slice: inout ArraySlice<Int>) -> ArraySlice<Int> {
  // Why was `slice` passed inout? Maybe it used to be
  // `slice[0..<1].reverse()` which is inlined to nothing.
2:  var sliceCopy = slice
3:  sliceCopy[0..<2].reverse()
4:  return sliceCopy
}

var array = DivideAndConquer.ContiguousArray(0..<5)
let slice = testEscapeWithTripleReference(array: &array)
assert(array[0] != slice[0])

Explanation of the Bug

I was not actually able to break the above test, even after much massaging. However, I believe SIL rules allow the case to break, and I think ARC optimization based on OSSA form after inlining would be sufficient.

Here is the sequence of operations, where 'owner' always refers to the CoW storage object:

ContiguousArray.subscript(bounds:)_modify

a. wasUnique = array.isUnique() = true
b. 'array.owner.isBorrowed' = true
c. retain 'slice.owner'

Copy ArraySlice

d. retain 'sliceCopy.owner'

ArraySlice.subscript(bounds:)_modify

e. isDuallyReferenced() = false (refcount==3)
f. borrowedOwner = nil
g. .reverse() copies storage
h. borrowedOwner == nil

return 'sliceCopy'

i. retain 'returnedSlice.owner'
j. release 'sliceCopy.owner'

Optimization: remove the retain/release of 'sliceCopy' based on proving all of these conditions:

'sliceCopy' is never checked for uniqueness (or mutated in any way)
'array' outlives 'sliceCopy'
'array.owner' is immutable during the lifetime of 'sliceCopy'

Now we have:

a. wasUnique = array.isUnique() = true
b. 'array.owner.isBorrowed' = true
c. retain 'slice.owner'
d. reuse 'sliceCopy.owner'
e. isDuallyReferenced() = true (refcount==2)
f. release 'borrowedOwner' (refcount==1)
g. .reverse() performs in-place mutation
h. retain 'borrowedOwner' (refcount==2)
i. retain 'returnedSlice.owner' (refcount==3)
j. ignore 'sliceCopy.owner'

This clobbers the original array.

Explanation of the Bug Fix

ContiguousArray.subscript(bounds:)_modify now takes the address of the mutable ArraySlice that it yields. This guarantees that it's 'owner' reference is opaque to the compiler. I used a call to strongRefCount(&rhs), but let's generalize this concept to _blackHole(inout AnyObject). We need two guarantees:

The compiler assumes that '_blackHole' may release its inout argument.
The compiler assumes that '_blackHole' cannot be eliminated.

The compiler can now never eliminate this operation:

c. retain 'slice.owner'

..because the retained reference may immediately be released by '_blackHole'. Even after the call to '_blackHole', it cannot share the representation of 'array.owner' and 'slice.owner', because it has no way to prove that they are the same value.

Consequences of the Bug Fix

For ArraySlice rvalues:

Implementation: Array.subscript(bounds:).get with no corresponding writeback
Returned ArraySlice may or may not be retained.
ArraySlice storage is copied on mutation (after copying to an lvalue).
Unaffected by the Divide-And-Conquer.

For ArraySlice lvalues today

Implementation: Array.subscript(bounds:).get + Array.subscript(bounds:).set:
Returned ArraySlice is retained and its storage is copied whenever
it is mutated (after optimization)
Returned ArraySlice may not be retained if it is not mutated
e.g. array[0..<1].reverse() never mutates the slice

For ArraySlice lvalues with (fixed) DivideAndConquer

Implementation: Array.subscript(bounds:)._modify
Returned ArraySlice is retained whenever it is mutated (after optimization)
Storage is never copied unless the slice is copied.
Returned ArraySlice may be retained if it is not mutated
e.g. array[0..<1].reverse() requires a retain and can't be fully eliminated.

DivideAndConquer trades-off an extra retain in a corner case (divide-and-conquer base case) for eliminating a copy of the storage in the common case (ignoring the other overhead of borrowed status checks).

Explanation of Compile-Time Reference Counting

Axiom 1: At a single program point, any two instances of the same value (identical values) can share a representation.

Theorem 1: for any reference-counted type, checking uniqueness requires apparent mutability of the reference. Otherwise, its reference may share its representation with other instances of the same value.

Commentary: The source of confusion around this stems from the variable scopes that appear evident at the source level. But the presence of those scopes do not subvert the rules above.

While it is true that these two forms may have different semantics in some resepects:

Form 1:

let x = foo()
bar(x)
_ = x

Form 2:
bar(foo())

Those semantics have no impact on whether the implementation can share the representations of foo's returned value, the value in variable x, and bar's argument. Simply put, a single reference will suffice. This is fundamental to the compiler's model of the language.

Axiom 2: A reference passed as an argument (whether owned or inout) may be released no more than once without at least one retain for every additional release.

ARC can be reduced to a counting problem where each operation has a net effect on its operand's reference count.

var obj: AnyObject

retain(obj):  +1
release(obj): -1
apply(obj):   -1..<infinite (possible owned convention)
apply(&obj):  -1..<infinite (reassignment is a release)

The only constraint on the compile-time implementation is proving that an object's reference count does not reach zero before its last use. It's the same exercise required for manual reference counting.

Theorem 2: The only fully general way to check refcount==2 is by simultaneously mutating two distinct references to the same object. For example:

Builtin.isDuallyReferenced(_ object1: inout AnyObject, _ object2: inout AnyObject) -> Bool {
  do {
    _precondition(object1 == object2)
  }
  return swift_isDuallyReference(Builtin.addressof(&object1),
                                 Builtin.addressod(&object2))
}

You may be thinking "can't the compiler have special magic operations".

Yes, the compiler can have all the magic it wants, but that does not change any of these facts. Those magic operations exist within APIs, and those APIs must adhere to the calling conventions of the language. The only way out of the rules I've put forth above would be a language level calling convention that provides for a single reference argument to be net released more than once (without corresponding retains). I'm comfortable stating that we will never have such a convention.

Axiom 3: for any CoW type, mutability implies uniquenes.

Theorem 3: From Theorem 1 and Axiom 3. For any CoW type, mutability and uniqueness are mutually implied. These conditions can be used interchangeably. (I find this useful to simply the discussion).

Let's apply these rules to DivideAndConquer.

The ArraySlice does not have access to the Array when checking isDuallyReferenced(). With access to only one reference, isDuallyReferenced() does not prove anything on its own.

To rely on isDuallyReferenced(), we must guarantee that ArraySlice "owns" a distinct reference to storage. That can be guaranteed if any of these statements are shown true:

the ArraySlice is checked for uniqueness (i.e. mutated)--and this
check cannot be removed.
the ArraySlice lifetime is surpasses the Array's lifetime
the Array is checked for uniqueness within the ArraySlice's
lifetime--and this check cannot be removed.

We do know that the Array will be accessed again after ArraySlice's lifetime. That does not help.

We do not otherwise have any control over Array. Therefore, the only option is forcing ArraySlice to be apprently mutated regardless of how much information the compiler has. Note that this is exactly equivalent to forcing the ArraySlice's owner reference to be copied, which is exactly equivalent to forcing a retain. A Builtin.forceRetain(inout AnyObject) would have the same effect.

Conclusion

DivideAndConquer is a working implementation of in-place ArraySlice mutation, given a fairly nonintrusive fix.

isDuallyReferenced(inout AnyObject), as written, should not be exposed as a user-level API.

dabrahams · April 25, 2020, 11:12am

Hey, thanks so much for writing all this down, and so clearly! Most of it makes complete sense to me and I have no questions or even nits to pick for the most part, but there are a couple, and I got lost towards the end…

I had to google OSSA and found this and this, neither of which is obviously final. Wish I had known about that document at the time, but in fairness I probably would have been too busy to absorb it. Is there an official version somewhere?

Theorem 1: for any reference-counted type, checking uniqueness requires apparent mutability of the reference.

Here I'd like to suggest we use the term apparent possible mutation (or just apparent mutation as you use later) of the reference since mutability is a type system concept. IIUC it's not the mutability of the reference in that sense you're talking about, but the inability for the compiler to prove that it isn't actually mutated, right?

Anyway, in light of how I understand(?) that, I got lost here:

Axiom 3: for any CoW type, mutability implies uniqueness.

It took me a long time to connect this “mutability” to the “apparent mutability” above, in which I hope I'm headed down the right path; before that I didn't have a clue. But even after I rephrase it as “apparent mutation of the reference” I can't make sense of it. It only seems axiomatically true if I read it as “mutation of the buffer referred to by the reference,” which is a very different thing and now I'm pretty sure I'm just guessing. Can you help me out with this one?

The only way out of the rules I've put forth above would be a language level calling convention that provides for a single reference argument to be net released more than once (without corresponding retains). I'm comfortable stating that we will never have such a convention.

Agreed, we won't. But it's not obvious to me that we couldn't come up with a new set of rules, provable to preserve the semantics of existing code code, while making some new things possible. I realize it's speculation on the order of “couldn't we solve this with some [magic] language feature?” but I would like to understand how fundamentally you hold your statement to be true.

Regarding your conclusions, they're pretty encouraging. The excess retain/release on a buffer that is mutable but never actually mutated under rather bizarre circumstances does not seem like it could have much impact on real programs. And, not that I'm counting on it, but this seems like one of those places where—with a different set of rules like what I referred to above—it might be possible for the compiler to prove that the the operations could be eliminated.

It's a little disturbing to me that under the current rules, the fix depends on an opaque, possibly-retaining, possibly-releasing, possibly-mutating, possibly-escaping use of the reference, and very disturbing that isUniquelyReferenced depends on the same opacity. It just seems intuitively like we must be robbing the compiler of more important information than necessary around these operations that are so crucial to performance and are in fact pure, and that it would better to give them some kind of special status in the rules. Am I wrong to be concerned?

Final question: when you wrote,

Let's keep those issues in separate posts.

Did you mean “in separate threads,” or would you be happy to deal with those issues in follow-up posts here?

Thanks again!
-Dave

Andrew_Trick · April 26, 2020, 8:40am

Good!

but there are a couple, and I got lost towards the end…

Your questions don't sound like you were lost.

I had to google OSSA and found this and this , neither of which is obviously final. Wish I had known about that document at the time, but in fairness I probably would have been too busy to absorb it. Is there an official version somewhere?

If we had an official spec, it would be referred to from the SIL spec... but it's not there yet. The working draft you found is probably the closest thing.

I think I first introduced the term in this post from 2017.

Michael just gave a talk on OSSA at LLVM Dev. I see a link to the video but no slides.

Anyway, this is the same point I made earlier about the compiler failing to remove nested retains and releases to the same reference. It's a problem with the SIL representation that's gradually being fixed but requires migrating the entire optimizer. It's been an ongoing activity for a while. The timeline for a feature like this has absolutely nothing to do with the complexity of the feature and everything to do with the prior amount of technical debt and maintenance burden of existing code.

Theorem 1: for any reference-counted type, checking uniqueness requires apparent mutability of the reference.

Here I'd like to suggest we use the term apparent possible mutation (or just apparent mutation as you use later) of the reference since mut ability is a type system concept. IIUC it's not the mutability of the reference in that sense you're talking about, but the inability for the compiler to prove that it isn't actually mutated , right?

Correct. I meant mutation.

Anyway, in light of how I understand(?) that, I got lost here:

Axiom 3: for any CoW type, mutability implies uniqueness.

It took me a long time to connect this “mutability” to the “apparent mutability” above, in which I hope I'm headed down the right path; before that I didn't have a clue. But even after I rephrase it as “apparent mutation of the reference” I can't make sense of it. It only seems axiomatically true if I read it as “mutation of the buffer referred to by the reference,” which is a very different thing and now I'm pretty sure I'm just guessing. Can you help me out with this one?

Again, I should have said "mutation" here, not mutability. I'm speaking as if the "CoW value" is represented within its buffer. This is the definition of CoW, nothing more.

The only way out of the rules I've put forth above would be a language level calling convention that provides for a single reference argument to be net released more than once (without corresponding retains). I'm comfortable stating that we will never have such a convention.

Agreed, we won't. But it's not obvious to me that we couldn't come up with a new set of rules, provable to preserve the semantics of existing code code, while making some new things possible. I realize it's speculation on the order of “couldn't we solve this with some [magic] language feature?” but I would like to understand how fundamentally you hold your statement to be true.

This is a problem of communicating any magic up through the layers of the API. We sometimes deal with this by adding "@_semantic" tags to APIs. But...key point...even though we can imbue certain source constructs with extra semantics, all the disparate, shady parts of the compiler that don't know about those special semantics still need to function correctly. A bit like adding a new case to a library enum. So, adding a new ownership passing convention requries about the same level of implementation complexity regardless of how its surfaced.

Regarding your conclusions, they're pretty encouraging. The excess retain/release on a buffer that is mutable but never actually mutated under rather bizarre circumstances does not seem like it could have much impact on real programs. And, not that I'm counting on it, but this seems like one of those places where—with a different set of rules like what I referred to above—it might be possible for the compiler to prove that the the operations could be eliminated.

We're talking about different issues. Above, I'm explaining why isDuallyReferenced is not a suitable general purpose API unless it's rewritten to take two @inout arguments. You are right, it would be possible to invent some compiler magic to allow extra retain of your ArraySlice to be optimized. But, all the code that accesses the copied ArraySlice would need to be visible (e.g. fully inlined). Otherwise, we're stuck at the argument convention problem again.

It's a little disturbing to me that under the current rules, the fix depends on an opaque, possibly-retaining, possibly-releasing, possibly-mutating, possibly-escaping use of the reference, and very disturbing that isUniquelyReferenced depends on the same opacity. It just seems intuitively like we must be robbing the compiler of more important information than necessary around these operations that are so crucial to performance and are in fact pure, and that it would better to give them some kind of special status in the rules. Am I wrong to be concerned?

If the lowest-level compiler builtin is exposed, then yes the compiler can recognize it. That is why I did not simply propose adding a unused uniqueness check instead of the _blackHole (or any other as-of-yet unknown runtime call). Some piece of intelligence exists in the compiler that is smart enough to know that an unused uniqueness check can be removed. And a piece of intelligence says they can be hoisted, and combined, and so on, based on complicated rules. That doesn't come close to covering the complexity. The important optimizations happen at a higher level of semantics, "make-mutable-and-unique", which statically guarantees that subsequent checks can be eliminated...

But, again, there are all the other disparate, shady parts of the compiler, including parts that have not yet been written. If that code sees a value being passed to a call, and knows that the call cannot modify the value, then it can share the value's representation. That's fundamental to how values work.

To fix that problem, you would need to change SIL so that uniqueness checking is somehow part of the argument passing convention that every piece of code in the compiler needs to deal with. But, at the source level, whenever you want to write a wrapper around some uniqueness-checking function, you'll still need to take the reference @inout for correctness--otherwise you've just hidden the fact that the reference can't share its representation. So all the complexity of making SIL more "uniqueness aware" would not really change anything.

Of course, another valid approach to giving a value "apparent/potential mutation" is returning the new value, rather than mutating in-place: (newReference, isUniqe) = isUnique(originalReference). That approach has the disadvantage of potential false negatives because the reference may be copied, but I don't think it changes anything for the purpose of this discussion.

A completely different approach to this problem would be give retains stronger semantics at the SIL level. Currently, the compiler only needs to guarantee that objects are alive up to their last use. We could instead say that retains cannot be eliminated merely because they redundantly retain the same reference. To eliminate any retain, the compiler would need to prove that no code can observe the reference count between that retain and some corresponding "paired" release. We don't currently represent "paired" releases, but OSSA will fix that. This would be a poor model for ARC optimization for at least two reasons

References, like any value, can be arbitrarily "aliased" across many
instances. So there's no way to determine whether something happens
to a referenced object just by looking at the uses of that instance
of the reference.
In general, we don't want to model a critical optimization such that
it depends on proving a negative condition throughout arbitrarily
complex code that is otherwise unrelated to the thing being
optimized.

Final question: when you wrote,

Let's keep those issues in separate posts.
Did you mean “in separate threads,” or would you be happy to deal with those issues in follow-up posts here?

I meant that I wanted to bracket the scope of this post and all of its in-line responses. I feel like there's a threshold of understanding that needs to be reached before in-line responses will converge. It looks like we've reached that point regarding implementation correctness.