Solving the mutating slice CoW problem

dabrahams · April 26, 2020, 4:51pm

No video link on that page, but Google found it here.

Yeah, my (mis-)understanding from talking with @Michael_Gottesman was that the ARC part of that project had been completely implemented.

OK, as to where I got confused, let me apply our corrections, a little more precision, and my own emphasis to what you've established:

Theorem 1: for any reference-counted type, checking uniqueness requires apparent mutation of the bits of the reference.
Axiom 3: for any CoW type, mutation of a part of its value accessed through the reference implies uniqueness.

I'm confused because then you go on to

Theorem 3: From Theorem 1 and Axiom 3. For any CoW type, mutability and uniqueness are mutually implied. These conditions can be used interchangeably. (I find this useful to simply the discussion).

I don't see how this theorem follows from the foregoing. It appears to be relying on the linguistic conflation of the bits of the reference with the value accessed through the reference to establish some kind of if and only if relationship. In fact, the two have an inverse relationship: if the contents of the buffer are going to be mutated, the bits of the reference won't, and vice-versa.

Another reason I'm having trouble seeing the connection is it seems to conflate “apparent mutation” with mutability. If you ignore the supposedly-supporting statements T1 an A3, which are about apparent mutation, and take T3's use of “mutability” as correct, T3 makes total sense. I just don't see how it rests on T1/A3 in any way.

It's not obvious to me that we couldn't come up with a new set of rules, provable to preserve the semantics of existing code code, while making some new things possible. I realize it's speculation on the order of “couldn't we solve this with some [magic] language feature?” but I would like to understand how fundamentally you hold your statement to be true.

This is a problem of communicating any magic up through the layers of the API. We sometimes deal with this by adding "@_semantic" tags to APIs. But...key point...even though we can imbue certain source constructs with extra semantics, all the disparate, shady parts of the compiler that don't know about those special semantics still need to function correctly. A bit like adding a new case to a library enum. So, adding a new ownership passing convention requries about the same level of implementation complexity regardless of how its surfaced.

OK, so this is “merely” a matter of implementation complexity and technical debt, rather than fundamental.

Regarding your conclusions, they're pretty encouraging. The excess retain/release on a buffer that is mutable but never actually mutated under rather bizarre circumstances does not seem like it could have much impact on real programs. And, not that I'm counting on it, but this seems like one of those places where—with a different set of rules like what I referred to above—it might be possible for the compiler to prove that the the operations could be eliminated.

We're talking about different issues. Above, I'm explaining why isDuallyReferenced is not a suitable general purpose API unless it's rewritten to take two @inout arguments.

I'm not sure what you mean by “we're talking about different issues.” I was referring to the parts of your conclusions where you say that DivideAndConquer can be made correct under the existing SIL rules with only a small change, and where you analyze what would happen with existing and recompiled binaries if we were to ship it as a change to the Array implementation. I do understand what you said about isDuallyReferenced, but “we're talking about different issues” seems to imply you think I was confused about something or conflating some things, and I don't see it.

You are right, it would be possible to invent some compiler magic to allow extra retain of your ArraySlice to be optimized. But, all the code that accesses the copied ArraySlice would need to be visible (e.g. fully inlined). Otherwise, we're stuck at the argument convention problem again.

Yes, since the retain is only extra in this corner case when there's no actual mutation, optimizing it out requires that the compiler can see that there's no actual mutation. I get that.

It's a little disturbing to me that under the current rules, the fix depends on an opaque, possibly-retaining, possibly-releasing, possibly-mutating, possibly-escaping use of the reference, and very disturbing that isUniquelyReferenced depends on the same opacity. It just seems intuitively like we must be robbing the compiler of more important information than necessary around these operations that are so crucial to performance and are in fact pure, and that it would better to give them some kind of special status in the rules. Am I wrong to be concerned?

[snip discussion of implementation complexity]

…At the source level, whenever you want to write a wrapper around some uniqueness-checking function, you'll still need to take the reference @inout for correctness--otherwise you've just hidden the fact that the reference can't share its representation. So all the complexity of making SIL more "uniqueness aware" would not really change anything.

I ask that we set implementation complexity aside for the moment because IMO we currently have a fairly weak foundation for interesting value semantics, and it seems as though incremental steps like building “uniqueness awareness” on that weak foundation could be far more complex than establishing a model that accounts for deeper semantic relationships.

I'm thinking “CoW and purity-awareness” is where we really want to be. For example, colleagues of mine were wondering why accessing a local let-bound Array's count in a closure results in the whole Array being captured. The fundamental reason, IIUC, is that the count is stored in the Array's buffer, which could be shared, and nothing tells the compiler that it acts just as though it were an independent value stored in the Array's shallow bits. Until those truths are encoded into SIL at a fundamental level, it seems inevitable that we'll have to chase all the “disparate, shady parts of the compiler that don't know about those special semantics.”

Of course, another valid approach to giving a value "apparent/potential mutation" is returning the new value, rather than mutating in-place: (newReference, isUniqe) = isUnique(originalReference) . That approach has the disadvantage of potential false negatives because the reference may be copied, but I don't think it changes anything for the purpose of this discussion.

Of course. And the potential for reference copying could be largely eliminated by annotations, but I agree it doesn't change anything. If the check could be written this way, where _isUnique was known by the compiler to be pure, it might change things a little (though I think it's awful):

let isUnique = withExtendedLifetime(oldReference) {
  let (newReference, r) = _isUnique(ObjectIdentifier(oldReference))
  oldReference = newReference
  return r
}

A completely different approach to this problem would be give retains stronger semantics at the SIL level. Currently, the compiler only needs to guarantee that objects are alive up to their last use. We could instead say that retains cannot be eliminated merely because they redundantly retain the same reference. To eliminate any retain, the compiler would need to prove that no code can observe the reference count between that retain and some corresponding "paired" release.

[snip why it wouldn't work]

Yeah, that seems way too low-level again.

Warning: compiler optimization amateur hour starts now
What about this: if we encode into SIL the idea of (independent) values that include bits accessible through references (i.e. CoW types), then you can say that retains can be eliminated for lifetime-dominated copies of a value when the compiler can prove the copies aren't potentially mutated. Among other things, that could allow the knowledge of immutability of lets and function parameters to be leveraged at the SILGen level, before optimization even starts.

Andrew_Trick · April 27, 2020, 2:59am

I want to clarify up front, I don't think I've ruled out the possibility of making the compiler more aware of CoW semantics. The only thing I wanted to absolutely rule out was exposing isDuallyReferenced as an independent API.

It's more that the new ARC work needs to be fully integrated. A whole lot of inlining and other optimizations need to take place before we see new ARC optimization handling the kind of complexity of the Array implementation... and those other optimizations currently destroy OSSA form.

I don't see how this theorem follows from the foregoing. It appears to be relying on the linguistic conflation of the bits of the reference with the value accessed through the reference to establish some kind of if and only if relationship. In fact, the two have an inverse relationship: if the contents of the buffer are going to be mutated, the bits of the reference won't, and vice-versa.

I'm mainly pointing out that checking uniqueness and buffer mutability always look the same at a regular call site.

It is true that apparent mutation of the reference bits does not need to imply mutation of the storage. I'm saying that implication is present without any special information at a call site.

The point that I keep trying to make unsuccessfully is that we can't transitively apply additional semantics upward when those semantics place additional requirements on the caller:

@_semantics("I don't really mutate any bits!")
func checkUniqueness(a: @inout CoWContainer) -> Bool

func abstraction(a: @inout CoWContainer) -> Bool {
  return checkUniqueness(&a)
}

At the point of calling abstraction, the compiler cannot see implementation and cannot distinguish between apparent mutation of the reference bits vs. mutation of the storage.

And the inverse, pretending we had a "move":

@_semantics("I observe a reference count!")
func checkUniqueness(a: AnyObject) -> Bool 

func abstraction(a: AnyObject) -> Bool {
  return checkUniqueness(move(a))
}

At the point of calling abstraction, the compiler cannot see implementation and cannot know that a's representation can't be shared with another reference.

Another reason I'm having trouble seeing the connection is it seems to conflate “apparent mutation” with mutability. If you ignore the supposedly-supporting statements T1 an A3, which are about apparent mutation, and take T3's use of “mutability” as correct, T3 makes total sense. I just don't see how it rests on T1/A3 in any way.

I hope I answered this above. I understand that your mutability has to do with whether we see a local variable or class property declared 'let'. Within the compiler's implementation of CoW, I think of "mutability" as the state of the CoW storage after uniqueness checking and buffer copying, which only happens during the process of mutation. But I'll try not to mix those up anymore.

This is a problem of communicating any magic up through the layers of the API. We sometimes deal with this by adding "@_semantic" tags to APIs. But...key point...even though we can imbue certain source constructs with extra semantics, all the disparate, shady parts of the compiler that don't know about those special semantics still need to function correctly. A bit like adding a new case to a library enum. So, adding a new ownership passing convention requries about the same level of implementation complexity regardless of how its surfaced.

OK, so this is “merely” a matter of implementation complexity and technical debt, rather than fundamental.

Well, it's also a matter of ABI. And, for the record, I can't think of anything I would change anything about our argument conventions starting over. I think what we're just dancing around here is that we need the type system to communicate CoW semantics to the compiler. If the compiler knows that the value being passed is or is not "CoWish" it could make extra assumptions about what happens on the other side.

I'm thinking “CoW and purity-awareness” is where we really want to be. For example, colleagues of mine were wondering why accessing a local let-bound Array's count in a closure results in the whole Array being captured. The fundamental reason, IIUC, is that the count is stored in the Array's buffer, which could be shared, and nothing tells the compiler that it acts just as though it were an independent value stored in the Array's shallow bits. Until those truths are encoded into SIL at a fundamental level, it seems inevitable that we'll have to chase all the “disparate, shady parts of the compiler that don't know about those special semantics.”

Interesting problem. It would be possible with the right builtins and optimizer magic to make this problem go away in some cases. But I typically fall on the side of designing for predictable behavior. Whenever CoW behavior is at stake, we should look for solutions that pop out of the type system, and don't involve whole program analysis. This what @Ben_Cohen meant by a mandatory (-Onone) solution.

Of course. And the potential for reference copying could be largely eliminated by annotations, but I agree it doesn't change anything. If the check could be written this way, where _isUnique was known by the compiler to be pure, it might change things a little (though I think it's awful):

let isUnique = withExtendedLifetime(oldReference) {
let (newReference, r) = _isUnique(ObjectIdentifier(oldReference))
oldReference = newReference
return r
}

Once @Erik_Eckstein fixes this, the optimizer will convert checks to this form behind-the-scenes whenever possible, where the uniqueness check will look like a pure function, even to all the disparate shady compiler parts.

What about this: if we encode into SIL the idea of (independent) values that include bits accessible through references (i.e. CoW types), then you can say that retains can be eliminated for lifetime-dominated copies of a value when the compiler can prove the copies aren't potentially mutated. Among other things, that could allow the knowledge of immutability of lets and function parameters to be leveraged at the SILGen level, before optimization even starts.

I think we should surface more CoW information to enable additional "mandatory" optimizations. I only push back when it looks like we need to change semantics in situations where the compiler doesn't have that CoW information.

I can think of two approaches to add more CoW awareness in SIL.

#1 New compiler builtins within the CoW container implementations and @semantic attributes

Here I get hung up on the fact the argument conventions limit any semantics that we want to apply transitively across API levels.

When passing references, the type system and argument conventions alone don't convey what can happen to the referent... whether it's refcount is observed or its storage mutated.

We have a similar problem with purity awareness. It would have been nice if some level of function purity was the default language semantic. Instead we rely very heavily on @inlinable attributes and whole-module analysis.

We can still do well with this approach at -O with careful attention to inlining. Using new builtins, @Erik_Eckstein is designing intelligence that allows the optimizer to know that CoW buffers are effectively immutable, when not being locally mutated. It should work as long as the CoW implementation is sufficiently inlined.

But... when CoW storage copies are at stake, we should design for predictability and guaranteed, -Onone behavior. This approach isn't good enough for mandatory optimizaton.

#2 Surface information about CoW data types in the type system

I do think surfacing type information is promising for mandatory optimization as long as it doesn't depend on changing something fundamental about values and argument conventions.

Whenever it has full type information (no properties hidden behind libraries) it would be possible for the compiler to understand CoW semantics at any level of the API. But, we have generic types and library types. So, I get hung up for the same reason.

In both approaches we need to accept that the compiler doesn't always know it's dealing with a CoW type.

We will need to define new argument passing conventions and special value semantics for move-only types. The move-only atomics case is similar in that we'll have something that appears immutable in its declaration, but is pinned to an address, and can apparently mutate at SIL level. But that all hinges on making any containing type conform to a special protocol... and that won't happen for CoW types.

So, I imagine there are lots of things we can improve with CoW semantics. All I know is that relying on a "dual-reference" check requires apparent mutation of the ArraySlice's reference, and that apparent mutation could probably be optimized away with compiler magic when we see there is no actual mutation.

dabrahams · April 27, 2020, 3:36pm

Since you both begin and end with that point, I guess I need to clarify up front: I have absolutely no interest in exposing isDuallyReferenced as a public API, or even in using it internally. The near as I can tell, the whole system can be built using just the extra bit in the ABI. isDuallyReferenced just happens to be what makes things simple with the current state of ARC, and will be a suboptimal way to handle it when ARC gets smarter. Ideally, we use the extra bit to denote that we can't deallocate the buffer or write outside the slice even when it's uniquely referenced.

A whole lot of inlining and other optimizations need to take place before we see new ARC optimization handling the kind of complexity of the Array implementation... and those other optimizations currently destroy OSSA form.

I'll take your word for it for now. My (fallible) instinct tells me that if we make the information in SIL high-level enough, the only needless CoWs left to eliminate with inlining are the rare cases where the guaranteed calling convention needlessly holds a reference across the mutation of a copy of a function parameter.

I'm mainly pointing out that checking uniqueness and buffer mutability always look the same at a regular call site.

Sorry, I'm still having trouble pinning down what you mean here. There is no check for buffer mutability other than uniqueness, so of course checking for one looks like checking for the other. That seems too obvious to be what you are saying, so I must be missing something.

The point that I keep trying to make unsuccessfully is that we can't transitively apply additional semantics upward when those semantics place additional requirements on the caller:
@_semantics("I don't really mutate any bits!")
func checkUniqueness(a: @inout CoWContainer) -> Bool

func abstraction(a: @inout CoWContainer) -> Bool {
  return checkUniqueness(&a)
}
At the point of calling abstraction , the compiler cannot see implementation and cannot distinguish between apparent mutation of the reference bits vs. mutation of the storage.

Of course it can't. The semantic annotation isn't present in the signature of abstraction, so the distinction is lost. The same goes for passing information downward in the call chain. It should be clear that I understand these points if you've been reading my arguments about why fixing this problem means encoding some information dynamically. Or are you making some other point that I'm missing?

Another reason I'm having trouble seeing the connection is it seems to conflate “apparent mutation” with mutability. If you ignore the supposedly-supporting statements T1 an A3, which are about apparent mutation, and take T3's use of “mutability” as correct, T3 makes total sense. I just don't see how it rests on T1/A3 in any way.

I hope I answered this above.

I'm afraid not, but maybe it's not important to see how you get to T3. If all it means is that “the buffer is eligible for mutation if and only if it is unique” then I'm happy to take that as an axiom.

I understand that your mutability has to do with whether we see a local variable or class property declared 'let'. Within the compiler's implementation of CoW, I think of "mutability" as the state of the CoW storage after uniqueness checking and buffer copying, which only happens during the process of mutation. But I'll try not to mix those up anymore.

It's not “my mutability,” it's the langage's mutability. When I used the name “makeMutableAndUnique” in the standard library I was writing more casually and I had the benefit of comments to explain what I meant. For this discussion, we need clear terminology, and the language has already defined “mutable” so let's come up with words for what you mean. Best I can do is above, “eligible for mutation,” or maybe “dynamically mutable.” Any other ideas?

I think what we're just dancing around here is that we need the type system to communicate CoW semantics to the compiler.

Yes please; it's only what I've been asking for since 2013 .

I'm thinking “CoW and purity-awareness” is where we really want to be. For example, colleagues of mine were wondering why accessing a local let-bound Array's count in a closure results in the whole Array being captured. The fundamental reason, IIUC, is that the count is stored in the Array's buffer, which could be shared, and nothing tells the compiler that it acts just as though it were an independent value stored in the Array's shallow bits. Until those truths are encoded into SIL at a fundamental level, it seems inevitable that we'll have to chase all the “disparate, shady parts of the compiler that don't know about those special semantics.”

Interesting problem. It would be possible with the right builtins and optimizer magic to make this problem go away in some cases. But I typically fall on the side of designing for predictable behavior. Whenever CoW behavior is at stake, we should look for solutions that pop out of the type system, and don't involve whole program analysis. This what @Ben_Cohen meant by a mandatory (-Onone) solution.

Then we're in violent agreement; I've been saying we should represent value semantics in the type system for years. I know what Ben meant and I always agreed we should aim for that.

Once @Erik_Eckstein fixes this, the optimizer will convert checks to this form behind-the-scenes whenever possible, where the uniqueness check will look like a pure function, even to all the disparate shady compiler parts.

I think we should surface more CoW information to enable additional "mandatory" optimizations. I only push back when it looks like we need to change semantics in situations where the compiler doesn't have that CoW information.

I'm afraid I don't know what you mean by “change semantics”

I can think of two approaches to add more CoW awareness in SIL.

#1 New compiler builtins within the CoW container implementations and @semantic attributes

To me, it seems obviously too low level again.

#2 Surface information about CoW data types in the type system

I do think surfacing type information is promising for mandatory optimization as long as it doesn't depend on changing something fundamental about values and argument conventions.

Whenever it has full type information (no properties hidden behind libraries) it would be possible for the compiler to understand CoW semantics at any level of the API. But, we have generic types and library types. So, I get hung up for the same reason.

Sorry to pick a nit, but it's not “CoW semantics,” it's “value semantics.” As @Joe_Groff has made clear, that's just an informal way to talk about purity. I hope we haven't locked ourselves into an ABI corner where we have to write off the ability to represent purity; if we have, I'll need to start thinking about the next version of Swift's ABI and how we're going to get there. But it seems to me that if we can retrofit move-only types, we ought be able retrofit purity.

In both approaches we need to accept that the compiler doesn't always know it's dealing with a CoW type.
We will need to define new argument passing conventions and special value semantics for move-only types. The move-only atomics case is similar in that we'll have something that appears immutable in its declaration, but is pinned to an address, and can apparently mutate at SIL level. But that all hinges on making any containing type conform to a special protocol... and that won't happen for CoW types.

…because?

Joe_Groff · April 27, 2020, 5:06pm

If we're talking ABI changes, one thing that would be of general benefit for value types would be a "copy on escape" calling conventions, where functions that receive value-type arguments would presume them to be allocated on the stack and be required to implicitly copy them to the heap if they want to extend the lifetime of the value for any reason (which would be the equivalent of a retain for a buffer that is already on the heap), similar to how blocks in Objective-C must be _Block_copy-ed if they are escaped. This would allow escape analysis and stack promotion to be much more aggressive. However, it might also be something we could apply to the in-place-slicing problem—if we treat the slice value yielded to a mutation as being copy-on-escape, and we consider operations that would force a realloc of the buffer to be escapes, then that seems like it would allow the original slice value to be a plain in-place borrow of the original buffer, since any attempt to escape or grow the slice would trigger a copy as well.

dabrahams · April 27, 2020, 5:55pm

How is this different from what the change from owned to guaranteed calling convention accomplished?

Joe_Groff · April 27, 2020, 6:00pm

The owned-to-guaranteed convention only affects the inline storage of the type. If we pass an Array value as a guaranteed argument, then the callee has to copy the reference to escape it, but it doesn't currently need to also trigger a copy of the buffer the reference points at. The parallel you raise is insightful, though; copy-on-escape would be in a sense extending the +0/+1 ownership convention from the inline storage transitively to its COWed buffers.

dabrahams · April 27, 2020, 6:55pm

I guess I'm still confused. If the value escapes, it gets copied. If it contains a reference, the copy implies a retain. Then the buffer isn't uniquely-referenced anymore and can't be deallocated or written on. Why would we resort to eagerly copying the contents of the buffer?

Joe_Groff · April 27, 2020, 7:39pm

In the case of non-mutating arguments, it's a matter of optimization tradeoffs. It's likely that most value-semantics don't escape their arguments in the common case, so it would be nice to be able to stack-allocate buffers in more cases. Forcing the few functions that do make escaping copies of their arguments to copy the buffer would allow other code to perform better. I haven't thought as much about the tradeoffs when it comes to in-place mutating slices, but it's interesting that a similar technique might be able to make in-place slicing work without changing the surface language model.

dabrahams · April 27, 2020, 8:59pm

Okay, I do see a distant relationship to the topic at hand, but it mostly seems entirely different to me. For one thing, it's not related to mutation.

snip a bunch of my crazy but irrelevant ideas about how this could be wedged into the ABI

Agreed, it's interesting.

Let me ask you, though: wouldn't getting this to be reliable across visibility boundaries, wouldn't it require encoding "this parameter escapes" pervasively in function/method signatures?

Andrew_Trick · April 28, 2020, 4:44am

My goal for this discussion was to make the compile-time model for reference counting clear enough that we could reach an understanding about why isDuallyReferenced is only very optimistically usable by a reference count implementation expert, and in your case, why the _blackHole fix is needed for correctness.

I think we're past that, but now side-tracked by my claim that uniqueness checking requires apparent mutation of the reference bits--at least if you travel far enough up the API where arbitrary compiler magic within the standard library no longer reaches. I'm standing by that claim, short of either

changing the compile-time model of reference counting in a way that's sure to inhibit optimization
or making it impossible to wrap a CoW type within arbitrary struct without forcing it to conform to some CoWContainer protocol. That's a language change I'm not ready to debate.

Our energy is now better spent on how to generalize a solution to the in-place slice mutation problem for any Collection. In my mind, the approaches for in-place slice mutation fall into one of these strategies...

Statically enforced in-place-only mutation via a new slice type

Such a slice cannot be reassigned, is not RangeReplacable, and cannot even be copied (normally) without copying the buffer storage.

This would (I think) be sufficient for typical Divide-And-Conquer algorithms. It would cause some confusion when we talk about mutability of a CoW value. The value's reference bits would be immutable, but its storage would be mutable.

I imagine it would be impossible to integrate with existing collections, and I suspect you aren't interested in something like this. I don't really want to discuss it further. I only bring this approach up to clearly distinguish it from the other approaches...

Copy-on-escape values

This requires somehow surfacing a distinction, as part of a variable's attribute, between a "locally pinned" CoW object vs. a "persistently addressable" CoW object.

A new argument convention would allow passing locally pinned objects, presumably using a bitwise copy that doesn't retain. Copying a locally pinned object would eagerly copy the buffer storage into persistently addressable memory.

This sounds exciting for optimizing local allocation. The design and implementation complexity is unknown to me. I'm concerned that it falls apart in practice because it requires a viral rewrite of function signatures.

Dynamically retained in-place slice mutation

This is what some people are now calling "@inout properties". I think it is the generalized form of what you implemented.

As you have correctly argued, we cannot support in place slice mutation and allow the slice's storage to be reallocated/reassigned without some dynamic bit controlling a potential extra release when mutation ends.

I think of this dynamic bit as indicating that a property is in some +0 state. You could say that the property is a "borrowed reference".

If we had such a +0 bit, then all we need are two explicit compiler or stdlib builtins: one that performs the +0 borrow and one that performs a +0 check. And we need type system and runtime support for recognizing +0 values when they are copied or destroyed.

Naturally, copying the borrowed reference needs to produce a new reference without that bit set, and destroying the borrowed reference would not perform a release.

Everything else is handled by existing uniqueness checks. If the buffer storage is replaced, then the +0 bit will naturally no longer be set, and the original storage will be released at the end of the mutation scope after an explicit check for the +0 bit.

I think others will be better able to discuss the design in more detail, how to surface these things in the language, and how to handle ABI breaks.

dabrahams · April 28, 2020, 5:00pm

Yes

No, we're not side-tracked by that. Many messages ago I wrote that I accept that's how the system works right now, and I understood at that point why it works that way. It's a natural artifact of the system having been focused on dealing with the issues of ordinary reference-counted class instances rather than the special properties of mutable slicing. And I'm not criticizing that, either, in case it's not obvious. I did say it forms a pretty weak foundation for getting the semantics we want from CoW types, and I still believe that.

That's fine; I think it debating it is out of scope here. To be clear for when we come back to this area, I'm not interested in the first bullet. I am interested in a variant of the second one, where I think a workable model is based on propagating “value semantics” (actually purity) rather than “CoW,” and doing so implicitly for structs except where operations are explicitly annotated as impure.

Our energy is now better spent on how to generalize a solution to the in-place slice mutation problem for any Collection. In my mind, the approaches for in-place slice mutation fall into one of these strategies...

Statically enforced in-place-only mutation via a new slice type

Such a slice cannot be reassigned, is not RangeReplacable, and cannot even be copied (normally) without copying the buffer storage.

This would (I think) be sufficient for typical Divide-And-Conquer algorithms. It would cause some confusion when we talk about mutability of a CoW value. The value's reference bits would be immutable, but its storage would be mutable.

I imagine it would be impossible to integrate with existing collections, and I suspect you aren't interested in something like this. I don't really want to discuss it further. I only bring this approach up to clearly distinguish it from the other approaches...

Oh, but I am very interested in something like this. When considering this problem I was lamenting that we had made slices range replaceable until I realized there was no way to suppress assignment. I didn't start here because it's a source-breaking change and because it requires language features (non-assignability) that we don't currently have, so I don't view it as something that can be done in the short term. Are you sure you don't want to discuss it?

Specifically, what I think would be interesting is that lvalues produced via range subscript are just such a non-assignable non-resizable instances (hah, that's ironic; a non-assignable lvalue!), and that you get something like what Slice is today otherwise. I think it's important that we can do things like make arrays of slices and I suspect non-assignability would make that impossible.

Unless you do it dynamically, the cost of which I think could hide in the noise. I started to write this in my response to Joe, but then deleted it:

• Reserve a magic refcount value to mean "I'm allocated on the stack and you can find a pointer to code for moving me to the heap just before me on the stack.”
• Make retains of escaping values recognize the magic refcount and do the copy.
• Obviously requires an ABI change to write back the copied reference.
• Does this extend beyond CoW buffers to regular class instances in the presence of multithreading? I'm not sure.

Glad someone has finally acknowledged that! It's hard to tell if some people are still clinging to the idea that the impossible can be achieved purely through language features. As long as that's the case, there's no hope for a solution.

That said, I don't see what a language feature can add to the story here, or how any language feature produces a generalization of what I implemented.

I'm pretty sure what you're outlining here is equivalent to what I've been saying will work provided the compiler “does its job,” with the advantage of guaranteeing that the job gets done .

IIUC,

• +0 borrow is used to create a +0 copy of the reference for the slice. It sets the bit but doesn't claim a reference, preserving uniqueness.
• Releasing a reference with a refcount of 1 but the +0 bit set is a no-op, preventing slice assignment from deallocating the buffer.

But some things are unclear still:

Wait, are you suggesting the bit is part of the reference representation (e.g. the low address bit) and not stored in the memory it points to (e.g. with the refcount)? Or, is the other way around and you actually meant “copying the buffer pointed to by the borrowed reference” (in which case it makes sense that the two references see a different +0 bit)? I suspect the former, since you didn't mention the need for an "unborrow" primitive, but then your “explicit check for the +0 bit mentioned below” seems like it would only be needed if it was the latter.

If you're talking about storing the bit in the reference itself, I'm not 100% sure about the tradeoffs, since you have to strip the bit to use or copy the pointer, but it does potentially save an atomic operation for unborrow.

I still think this code could still be a problem if it's been compiled into clients.

Andrew_Trick · April 29, 2020, 1:34am

I think a workable model is based on propagating “value semantics” (actually purity) rather than “CoW,” and doing so implicitly for structs except where operations are explicitly annotated as impure.

I'm on board for purity is an opt-in attribute for both functions and types (too late for opt-out!). I expect pure types to be contained within impure types though. So, when the compiler sees a pure type, it can make additional assumptions, but it can't assume values of that type will always be handled by the compiler differently from normal values--there will be times the compiler can't see that the value is pure based on its static type.

Specifically, what I think would be interesting is that lvalues produced via range subscript are just such a non-assignable non-resizable instances (hah, that's ironic; a non-assignable lvalue!), and that you get something like what Slice is today otherwise. I think it's important that we can do things like make arrays of slices and I suspect non-assignability would make that impossible.

Oh, it is interesting that we have the same view on this. Swift will probably need to support an immutable (non-reassignable) pass-by-address argument convention for other situations where copies must be avoided. I've heard people refer to this as "@inout lets". So I don't think reassignability is a showstopper. I think it's just design work that we haven't had time for and largely overlaps with move-only types.

I don't know the feasibility of redesigning the Collection APIs around some new non-RangeReplacable slice.

Wait, are you suggesting the bit is part of the reference representation (e.g. the low address bit) and not stored in the memory it points to (e.g. with the refcount)? Or, is the other way around and you actually meant “copying the buffer pointed to by the borrowed reference” (in which case it makes sense that the two references see a different +0 bit)? I suspect the former, since you didn't mention the need for an "unborrow" primitive, but then your “explicit check for the +0 bit mentioned below” seems like it would only be needed if it was the latter.

I'm talking about storing a bit alongside the reference, just as you have done, but potentially using the spare bits via an enum.

The "runtime support" is a custom value-witness for copy and destroy. The copy resets the borrowed bit in the copied value. The destroy does not release the borrowed reference.

The +0 check is only needed at the end of Array.subscript._modify to release the original storage in the event that the borrowed reference was replaced. But that's probably equivalent to checking the identity of the reference, so it's nothing new.

anandabits · April 29, 2020, 2:31am

@Joe_Groff has talked about introducing a => arrow for functions that explicitly specify any effects (and are therefore pure by default).

I hadn’t heard of this before but definitely have use cases for it. I even have debug code that dynamically validates this contract isn’t violated.

dabrahams · May 1, 2020, 3:43pm

It doesn't seem like you're ruling out other source-breaking changes elsewhere in our discussions. Why rule out opt-out at this point?

Technicalities: as @Joe_Groff has argued and repeatedly convinced me, values and types aren't pure, just operations. Think of UnsafeMutablePointer: all operations that don't dereference it are pure. What you can do is provide shortcut annotations for the default purity of methods on a type, for example, and you can speak casually about something “having value semantics,” but nothing can actually ensure that all operations on that type are pure: anyone can create an impure function that takes such a type as a parameter. After all, Int “has value semantics” but you can still use it to index an UnsafeMutablePointer.

Anyway, unless I'm misunderstanding your concern above, the law of exclusivity should be enough to give the compiler all the information it needs, no? If not, could you please explain your concern a little more fully.

Specifically, what I think would be interesting is that lvalues produced via range subscript are just such a non-assignable non-resizable instances (hah, that's ironic; a non-assignable lvalue!), and that you get something like what Slice is today otherwise. I think it's important that we can do things like make arrays of slices and I suspect non-assignability would make that impossible.

Oh, it is interesting that we have the same view on this. Swift will probably need to support an immutable (non-reassignable) pass-by-address argument convention for other situations where copies must be avoided. I've heard people refer to this as "@inout lets". So I don't think reassignability is a showstopper. I think it's just design work that we haven't had time for and largely overlaps with move-only types.

And pinned types, which don't even move.

I don't know the feasibility of redesigning the Collection APIs around some new non-RangeReplacable slice.

I think maybe you have misunderstood me, then. I'm suggesting that we leave the existing slices in place, but introduce a non-reassignable/non-range-replaceable slice (NRRRS) contextually with (effectively) a preferred subscript overload for lvalue contexts.

f(x[i..<j])               // Slice
let s = x[i..<j]          // Slice
f(&x[i..<j])              // Prefer NRRRS, fall back to Slice
x[i..<j].mutatingMethod() // Prefer NRRRS, fall back to Slice
x[i..<j].otherMethod()    // Slice? (NRRRS is efficient but seems wrong; IDK)

You're still confusing me a bit with the following:

This is going to sound like I'm telling you things you already know (sorry), but I'm really just trying to explain why your answer is unclear to me.

I am interpreting “alongside” as meaning “a fixed distance away in memory.” In my implementation the bit is stored inside the referent, not the reference. If it's “alongside” anything it's alongside the strong reference count and/or the instance.

When we talk about “the spare bits” we're usually talking about bits in the pointer representation that can never be 1 for valid addresses, and thus can be used in the reference implementation for other things as long as they are stripped before accessing the referent. We can't newly decide to store anything alongside the reference in most @frozen types—which the most important collections certainly are—except “in the spare bits,” so that wouldn't be definitely, not “potentially”, in the spare bits AFAICT.

So you see, I'm just looking for an answer to this simple question: are you talking about storing the bit

a fixed offset from the reference (0 for spare bits), or
a fixed offset from the base address of the instance, or
are you saying they are equivalent, or
something else?

I need to know or I can't interpret what you said about copying the buffer.

Are you sure that's threadsafe? We have to watch out for mutating anything from ostensibly nonmutating operations like copies, or we've created something that's less threadsafe than Int, which is basically impossible to program with in a concurrent environment... and I think(?) it's too late to introduce a new atomic other than the refcount.

dabrahams · May 1, 2020, 4:02pm

Ideally, I'd like it if struct methods written with -> could be pure by default, which would make such a terse way to say “this is pure” much less necessary… but I recognize that may not be possible, in which case => is definitely a reasonable answer.

anandabits · May 1, 2020, 4:22pm

I think it would be great if all members of all value types were pure by default. This includes enums and tuples (if they get extensions) in addition to structs and includes computed properties and initializers which do not have an arrow at all.

I think we could get there using a phased approach that looks something like this:

introduce a temporary annotation allowing a type to opt in to pure-by-default
in the same release or a subsequent release start producing a warning for value types that do not opt in
after a period of type change the warning to an error
eventually deprecate the annotation

But while this is possible I suspect the community and core team don't have the appetite for it. In addition to the churn it would cause, it would be extremely harmful in the way it affects all of the Swift educational content, Stack Overflow content, etc that works with value types.

There are some arguments that can be made for using a distinct arrow to indicate purity. When reading an extension it wouldn't be possible to know whether a method is pure or not without knowing whether the extended type is a value type or not. It would also mean the purity of a protocol requirement depends on the conforming type. With a distinct arrow the intended semantics become more clear.

One of the trickiest aspects of purity in a language like Swift is designing it in such a way that generic code like Collection.map is understood to be conditionally pure. i.e. its purity depends on the implementation of the collection it is used with as well as the transform that is provided. I believe this will be crucial to designing a language feature that works well and is widely used.

Andrew_Trick · May 1, 2020, 5:38pm

dabrahams:

I am interpreting “alongside” as meaning “a fixed distance away in memory.” In my implementation the bit is stored inside the referent , not the reference. If it's “alongside” anything it's alongside the strong reference count and/or the instance.

When we talk about “the spare bits” we're usually talking about bits in the pointer representation that can never be 1 for valid addresses, and thus can be used in the reference implementation for other things as long as they are stripped before accessing the referent. We can't newly decide to store anything alongside the reference in most @frozen types—which the most important collections certainly are—except “in the spare bits,” so that wouldn't be definitely, not “potentially”, in the spare bits AFAICT.

So you see, I'm just looking for an answer to this simple question: are you talking about storing the bit

a fixed offset from the reference (0 for spare bits), or

a fixed offset from the base address of the instance, or

are you saying they are equivalent, or

something else?

I need to know or I can't interpret what you said about copying the buffer.

I'm saying any of the above, preferably using the spare bits, likely implemented as a reference-payload-enum with special powers.

I realize your bit is in the array storage header and must have forgotten that distinction in my previous response (sorry). I'm sometimes confused because I think of the "array buffer" as the struct containing the reference (because that's literally what it's named in the source).

I'm not sure if you are proposing some other variant of your implementation that does not use isDuallyReferenced.

I think what I proposed (and others have also proposed) is thread safe because the borrowed bit only changes when the reference is modified. i.e. copying a borrowed reference does not change the original reference, it only creates a new copy of the reference without the borrowed bit set and bumps the refcount.

dabrahams · May 1, 2020, 7:25pm

Okay, yeah, the term in the stdlib for the class instance is "storage," not "buffer." Sorry for the confusion.

Anyway, I'm not sure they're entirely interchangeable arrangements. When I think through the implications they certainly lead to different concerns:

In the reference's spare bits:
- Not observable from existing copies of the reference
- Will be observable from new copies of the reference (unless you have a special value witness to clear it when copied—interesting idea. I wonder if thinking of it as a builtin that retains and returns the retained reference helps.)
- May require distinguishing “true copies” from moves, including moves made by copy+destroy
In the buffer storage:
- Observable from new copies of the reference until the inout scope is over (observability from existing copies can optionally be prevented by checking uniqueness first)
- You have to be careful to maintain thread safety, because other threads can observe the same bit.

I'm not sure if you are proposing some other variant of your implementation that does not use isDuallyReferenced.

I don't know if it's fundamentally different. I think I was a little more specific in my thinking about, you know, where the bit would live, but right now I'm having trouble proving to myself that what I had can be thread-safe while still handling escaping.

That reasoning only works with the bit-in-the-reference arrangement, I think.

dabrahams · May 1, 2020, 7:29pm

If you believe it's the right default, then surely that harm would be temporary, because most value types do in fact have value semantics.

There are some arguments that can be made for using a distinct arrow to indicate purity. When reading an extension it wouldn't be possible to know whether a method is pure or not without knowing whether the extended type is a value type or not. It would also mean the purity of a protocol requirement depends on the conforming type. With a distinct arrow the intended semantics become more clear.

Excellent points!

One of the trickiest aspects of purity in a language like Swift is designing it in such a way that generic code like Collection.map is understood to be conditionally pure. i.e. its purity depends on the implementation of the collection it is used with as well as the transform that is provided. I believe this will be crucial to designing a language feature that works well and is widely used.

Just one more argument for a first-class effects system.

While I would very much prefer to have conditional purity, one could handle that problem with overloads, FWIW.

anandabits · May 1, 2020, 7:47pm

Sure, the harm would be temporary. I'm way more willing to suffer the pain caused by churn for the sake of a better long term design and language than most are. I was mostly anticipating arguments that I think would be made rather loudly. Maybe I'm wrong about that.

Sure, a first-class effects system would be extremely valuable in many ways. Overloads are a workaround but I suspect they would lead to a lot of clunkiness, especially in library code.