Law of Exclusivity/memory safety question

dabrahams · December 27, 2020, 1:23am

Hi,

I want to use the technique demonstrated here to destructively move out of self, saving its address, just before a yield, and re-initialize self immediately after the yield.

If I understand correctly, the LoE guarantees that no legal code will try to observe self during this mutation, so I think(/hope?) I don't need to worry about that. One person I know has suggested that it might still be unsafe because when &self is used in an UnsafeMutablePointer expression context, it's conceivable that self gets copied or moved into a temporary for the duration of the expression. Personally, I imagine that would cause so many problems for common C/ObjC interop patterns that we could never do it, but I'm not sure it's formally outlawed. Can I get a ruling on this from someone in a position to rule?

Thanks!

/cc @John_McCall @Andrew_Trick

Andrew_Trick · December 27, 2020, 8:31am

Well, it appears what you've done could work in practice. I think the best you can do for now is check that the compiler doesn't violate your assumptions. If the runtime check is too expensive, I could imagine someone adding support for a new compiler diagnostic that does the same check at compile-time: verify that the annotated variable isn't moved or copied.

In theory, the compiler can move values wherever it wants since there's no formal address-pinning beyond the UnsafePointer scope. I don't think that's much of a practical concern, but the compiler could do the following, assuming "instrinsic_move" is its mechanism for relocating storage:

  var selfCopy = intrinsic_move(self)
  var (p, kv) = unsafeMoveMap(destroying: &selfCopy) { $0.tuple }
  self = intrinsic_move(selfCopy)
  defer { p.initialize(to: .init(kv)) }
  yield &kv

In fact, since KeyValuePair is a copyable type, those intrinsic_move's could even be copies--of course the compiler won't do that because it would degrade the quality of CoW.

There was some talk recently about pinning the address of certain class properties--that won't help you, but it's interesting background: Exposing the Memory Locations of Class Instance Variables - #28 by lorentey

With generalized coroutines you do the following:

yield withUnsafeMutablePointer(&self) {
  var kv = $0.move().tuple
  yield &kv
  $0.initialize(to: .init(kv)
}

Hopefully we're not too far away from supporting that, and it does protect you against anything the compiler could do.

It still seems there should be a safe way to borrow transformed values though, I'm just not sure what should look like. You need to promise to writeback your borrowed variable at the end of some scope.

Implicit borrow scopes don't quite work

{
  borrowed let tempSelf = self
  var kv = tempSelf.tuple
  yield &kv
  // implicit borrow scope ends here, moving tempSelf back to self
}

because tempSelf.tuple still forces a copy while the original tempSelf needs to stay live for the writeback move.

I imagine that explicit 'move's could be enforced by the compiler though:

{
  var kv = builtin_move(self).tuple
  yield &kv
  // Compiler checks that 'self' is reinitialized before returning.
  self = builtin_move(KeyValuePair(kv))
}

dabrahams · December 28, 2020, 1:35am

Runtime check? How would you implement that check in a way that doesn't rely on the same assumptions?

In theory, the compiler can move values wherever it wants since there's no formal address-pinning beyond the UnsafePointer scope. I don't think that's much of a practical concern, but the compiler could do the following, assuming "instrinsic_move" is its mechanism for relocating storage:
  var selfCopy = intrinsic_move(self)
  var (p, kv) = unsafeMoveMap(destroying: &selfCopy) { $0.tuple }
  self = intrinsic_move(selfCopy) // <=======
  defer { p.initialize(to: .init(kv)) }
  yield &kv

[for completeness: of course the line I highlighted has to be interpreted as an initialization rather than an assignment.]

Well, yeah, that was what I was concerned about. However, I don't see any reason the compiler would ever have to implicitly move an inout value, so why don't we just make this simple and enshrine that guarantee in the language?

In fact, since KeyValuePair is a copyable type, those intrinsic_move's could even be copies--of course the compiler won't do that because it would degrade the quality of CoW.

I think it also won't do that just because it would break a bunch of working code that's using C interop, not to mention making it needlessly inefficient (these things apply to the moving variant as well as the copying variant):

var buffer = (0, 0, 0, 0, 0, 0) // global
func doSomething() {
 someCFunctionThatStoresABufferAddress(&buffer, 6)
}

I think that would arguably fall out of the guarantee that inout values aren't implicitly moved.

Yes, but there's still nothing to prevent the compiler from copying self into a temporary at the withUnsafeMutablePointer call, which would defeat the purpose of what I'm doing. I think we really need a stronger guarantee for inout values to enable low-level programming. When I bring forward my planned proposal for non-copyable types I'm pretty certain I would want to roll that in.

It still seems there should be a safe way to borrow transformed values though, I'm just not sure what should look like. You need to promise to writeback your borrowed variable at the end of some scope.

Yeah, that's already part of the idiom shown in my example.

I imagine that explicit 'move's could be enforced by the compiler though:
{
  var kv = builtin_move(self).tuple
  yield &kv
  // Compiler checks that 'self' is reinitialized before returning.
  self = builtin_move(KeyValuePair(kv))
}

Yeah… I wasn't planning on proposing—at least initially—to allow you to explicitly move out of an inout value, but with some DI-like lifetime tracking, that could become safe, and that would be really nice.

Andrew_Trick · December 28, 2020, 11:05am

I think you're suggesting that the language should make certain guarantees about address stability with respect to inouts. Essentially, taking the address of an inout variable should always produce an in-place address to the same memory object:

func bar(p: UnsafePointer<X>) { ...}

func foo(x: inout X) {
  bar(p: &x)
  bar(p: &x)
}

Each call to bar should have the same value for p, which should point to the same memory object referred to by x.

That requirement works for me. We more-or-less depend on it already. I'm not sure what C interop problem you're referring to though.

var buffer = (0, 0, 0, 0, 0, 0) // global
func doSomething() {
 someCFunctionThatStoresABufferAddress(&buffer, 6)
}

I don't think there's any guarantee that a global tuple declared in Swift has a stable address across calls to C functions. I wouldn't be too surprised if code relies on that, but I'm not aware of it off hand.

Regarding your other questions, I think we're mixing up two problems:

#1 Optimizing CoW storage copies

Although a strong guarantee would be ideal, there's no correctness requirement that copies are avoided. In fact, it looks like you're relying on optimizations to remove source-level copies, using simple idioms that can be reliably optimized.

#2 Address stability

Your solution to #1 relies on address stability, but that introduces a theoretical correctness issue. I agree that formalizing a narrow rule about inout address stability is one way to fix that theoretical correctness issue.

With generalized coroutines, you could take the same UnsafePointer-based approach to problem #1, but completely avoid problem #2.

yield withUnsafeMutablePointer(&self) {
  var kv = $0.move().tuple
  yield &kv
  $0.initialize(to: .init(kv)
}

This is the proper way to guarantee pointer stability, and withUnsafeMutablePointer(&self) is no more likely to lead to a spurious copy than unsafeMoveMap(&self).

A new builtin_move would replace the need for UnsafePointer.move() and would not require generalized coroutines.

  var kv = builtin_move(self).tuple
  yield &kv
  // Compiler checks that 'self' is reinitialized before returning.
  self = builtin_move(KeyValuePair(kv))

But even this approach assumes that the optimizer can remove source-level copies and can avoid introducing spurious copies. At the language level, the .tuple getter itself produces a copy of the key and value, as does assigning that tuple into var kv. Avoiding the source level copies requires even more fastidious use of moves, such as the following (although I'm probably getting this wrong):

public var tuple: (key: Key, value: Value) {
   _modify {
    var kv = builtin_move(builtin_move(self.key), builtin_move(self.value))
    yield &kv
    self = builtin_move(KeyValuePair(builtin_move(kv)))
  }
}

There are certainly other approaches to handling this at the source-level that I'm not thinking of at the moment.

I mentioned that, without any language support, the compiler could support programmer annotations to indicate the need to avoid copies along with guaranteed (-Onone) optimizations and supporting diagnostics...

  @_nocopy // error if the compile copies any members of self.
  _modify { 
    var (p, kv) = unsafeMoveMap(destroying: &self) { $0.tuple }
    defer { p.initialize(to: .init(kv)) }
    yield &kv
  }

You would still need either UnsafeMutablePointer.move() or some new builtin_move to eliminate the copy between self, represented as KeyValuePair, and the tuple representation. The compiler can only promise to remove temporary copies and to avoid introducing new temporaries. Both self and kv have their addresses taken within the same scope--neither is a "temporary copy". But a @nocopy annotation would at least ensure address stability (without a new language rule) and ensure no additional copies crop up unexpectedly outside of that language rule. Without that compiler support, a "runtime check" could at least enforce address stability by checking UnsafePointer equality.

withUnsafePointer(&self) { assert(p == $0) }

But you really want the check at -O too.

dabrahams · December 28, 2020, 8:40pm

Correct. Furthermore, I'd like the guarantee that if the variable is again passed along as an inout parameter, the resulting variable will have the same observed address, and that stored properties of that variable will have an observed address within the variable's address range, etc.—all of the things that fall out of inouts never being implicitly copied or moved (where pass-by-value is considered explicit permission to copy). IMO a range of guarantees, that are already consistent with the behavior of the compiler, are going to be needed in order to make low-level programming practical in Swift, and make noncopyable types useful.

That requirement works for me. We more-or-less depend on it already. I'm not sure what C interop problem you're referring to though.

Just what you say later: IMO there's probably a lot of existing code out there that depends on it. I know my example isn't guaranteed to work, but it matches most C programmers' expectations about the model and I'm sure they've done it. Other examples include passing the address of the same inout variable to two different calls and expecting the calls to see the same pointer value.

What source-level copies do you see in my code, exactly? I probably don't know what that term means, so could you please define it?

Regardless, as I said above, I think we're going to need guarantees like that to make Swift good for low-level programming, and I think the compiler doesn't need the flexibility to violate it. IIUC, the violation would come by reassembling an exploded value on the stack so you can pass its address when you've already lost the register value holding the object's address due to register pressure. But then you'll eventually need to reconstruct the object's address to do the writeback. I can't see that ever being a substantial win over simply stashing the address on the stack, can you?

With generalized coroutines, you could take the same UnsafePointer-based approach to problem #1, but completely avoid problem #2.
yield withUnsafeMutablePointer(&self) {
  var kv = $0.move().tuple
  yield &kv
  $0.initialize(to: .init(kv)
}
This is the proper way to guarantee pointer stability, and withUnsafeMutablePointer(&self) is no more likely to lead to a spurious copy than unsafeMoveMap(&self) .

And no less likely.

Unless I'm missing some information about what “proper” means to you, this way is also no more proper than a guarantee about inout would be, as neither one exists yet. IMO having subtle rules that require programmers to jump through a mental “coroutine hoop” for low-level memory manipulation (and only sometimes, and it's hard to tell when!) is problematic for successful C and C++ interop on a large scale (my current mission at Google), not to mention general low-level programming in pure swift. Except where we can find really good reasons to preserve the compiler's flexibility to implicitly copy or move values, it's important to have guarantees that match what the compiler's going to do anyway. That's especially true when the guarantee would otherwise only rarely be broken. We can wish it weren't so, but most programmers won't look at our specifications before deciding what they can write; Hyrum's Law governs all, in the long run.

A new builtin_move would replace the need for UnsafePointer.move() and would not require generalized coroutines.
  var kv = builtin_move(self).tuple
  yield &kv
  // Compiler checks that 'self' is reinitialized before returning.
  self = builtin_move(KeyValuePair(kv))
But even this approach assumes that the optimizer can remove source-level copies and can avoid introducing spurious copies. At the language level, the .tuple getter itself produces a copy of the key and value, as does assigning that tuple into var kv .

But of course you don't write it like that! It's

yield &self.tuple

That creates no copies at the source level (at least to the extent I understand what that term means to you).

Also, I think that operation is just called “move”

You lost me after this. The complexity of the rest is overwhelming, and seemingly unnecessary, unless I'm missing something.

Dante-Broggi · December 28, 2020, 9:45pm

Looking at this example again I believe this:

    _modify {
      // The LOE says nothing else can touch `self` while we're in _modify, so
      // we can safely destroy it as long as we put it back before we're done.
      var (p, kv) = unsafeMoveMap(destroying: &self) { $0.tuple }
      defer { p.initialize(to: .init(kv)) }
      yield &kv
    }

Is currently UB. Because it is (IIUC) semantically equivalent to this:

    _modify {
      // self may be in registers
      var (p, kv) = withUnsafeMutablePointer(to: &self) { p in
         // self is (moved to be) in memory at address p
         return unsafeMoveMap(destroying: p) { $0.tuple }
      }
      // self may (be moved to) be in registers, p is semantically deallocated.
      defer {
         // writing to dealocated memory.
         p.initialize(to: .init(kv)) 
      }
      yield &kv
    }

In addition I believe the copies/moves @Andrew_Trick sees are those to/from registers.

Unfortunately, the only "Position to rule" I have is having read a lot of other responses to "Is this UB?" questions.

dabrahams · December 28, 2020, 10:42pm

Yes, I have no doubt; and those copies are exactly the ones I mentioned as a concern in the post opening this thread. I'm trying to get a handle on whether we have any reason for the compiler to actually (not theoretically) make those copies, and whether we can reasonably lock down that behavior if not.

In addition I believe the copies/moves @Andrew_Trick sees are those to/from registers.

Of course, I have already stated I'm not sure because I don't see any in the code I posted, but I assume by “source-level” copy, he means a copy that is visible in the source, e.g. as pass-by-value to a call, return-by-value, or as an assignment. That interpretation is supported by this quote:

jrose · December 29, 2020, 1:23am

I do want to point out that we have publicly stated that the inout-to-pointer conversion for a module-scope variable will always produce the same address. I can't remember if we've made the same guarantee about static variables, and we've (Apple Swift people have) explicitly said it's not a safe assumption for class instance variables (even known-stored ones), locals, or struct stored properties (relative to the base value, which will be one of the other things mentioned). I don't actually know if anyone's asked about inout yet!

A trivial case where it would happen for self is if self is passed in registers, which would be a valid implementation strategy for mutating functions (but not one the compiler currently uses, as far as I know). In that case it seems plausible for the compiler to put self on the stack for the inout-to-pointer conversion, then take it back off afterwards, and not necessarily use the same slot next time.

But this thread isn't exactly asking about the current behavior; it's about the "language" making more guarantees. It is a trade-off to insist inout parameters have stable addresses because that's (potentially) something that applies all the way down a call stack—it would rule out pass-in/pass-out (or callee-preserved-register-based) inout ABIs. We may not care, though.

I do want to note that "inout parameters have stable addresses" does not automatically imply that "stored properties of inout parameters or var locals have stable address". Both seem useful and potentially necessary for the guarantees Dave wants. (Is that correct?)

That would also pretty much only leaves (known-stored) class instance variables as "storage without a stable address guarantee". Is there a benefit to that exception?

dabrahams · December 29, 2020, 2:41am

OK, fair enough; I keep forgetting that ABI lockdown only applies to Apple platforms, and only outside the OS. It's reasonable to think that an inout Int would be passed by copy-out/copy-back in some future ABI. You don't even need to involve registers for that to happen, although without them it wouldn't be a very good optimization. For those following along, I wasn't worried about registers in the sense that @Dante-Broggi mentioned because IIUC the current ABIs pass all inouts by pointer, and things only get promoted to registers as an optimization where all the code is visible, leaving the compiler in a position to make the guarantee I was asking for.

But this thread isn't exactly asking about the current behavior; it's about the "language" making more guarantees. It is a trade-off to insist inout parameters have stable addresses because that's (potentially) something that applies all the way down a call stack—it would rule out pass-in/pass-out (or callee-preserved-register-based) inout ABIs. We may not care, though.

I do want to note that " inout parameters have stable addresses" does not automatically imply that "stored properties of inout parameters or var locals have stable address". Both seem useful and potentially necessary for the guarantees Dave wants. (Is that correct?)

I think so? I guess I should soften my earlier statement a little—I shouldn't prematurely insist that we need the compiler to act the way people expect C to act. We'll know better exactly what kinds of guarantees are needed once we get more experience with move-only types and C++ interop. I feel a bit more strongly, though, that asking programmers to routinely invert control using a coroutine, just to get a stable address, is probably not a tenable programming model.

Andrew_Trick · December 29, 2020, 5:44am

I think we could guarantee "no user-visible implicit copies" of inouts. I'm worried that we have a different definition of implicit copies. This isn't relevant for correctness, but I see two explicit copies of the key-value pair in one statement from your code:

var (p, kv) = unsafeMoveMap(destroying: &self) { $0.tuple }

Unless I'm mistaken the .tuple getter returns a copy of the newly constructed tuple, and assignment into kv copies that tuple into an lvalue.

dabrahams:

Unless I'm missing some information about what “proper” means to you, this way is also no more proper than a guarantee about inout would be, as neither one exists yet. IMO having subtle rules that require programmers to jump through a mental “coroutine hoop” for low-level memory manipulation (and only sometimes, and it's hard to tell when!) is problematic for successful C and C++ interop on a large scale (my current mission at Google), not to mention general low-level programming in pure swift. Except where we can find really good reasons to preserve the compiler's flexibility to implicitly copy or move values, it's important to have guarantees that match what the compiler's going to do anyway. That's especially true when the guarantee would otherwise only rarely be broken. We can wish it weren't so, but most programmers won't look at our specifications before deciding what they can write; Hyrum's Law governs all, in the long run.

The "proper" way to acquire a stable pointer in Swift is the way that the language was designed to support from the outset, has simple well-defined rules, has been consistently communicated to developers and can be enforced by the compiler in most cases.

I'm not defending the programming model; improvements to that are welcome. Let's just be clear that adding conditions for address stability complicates the language, add allowing pointers to escape their scope under those specific conditions makes it significantly more difficult to diagnose undefined behavior. That has to be a worthwhile tradeoff.

dabrahams · January 2, 2021, 7:42pm

It's a start. I don't think it's enough, though:

We need to cover moves, or the compiler could expect to implicitly move it from deinitialized memory after I've already deinitialized it by explicitly moving from it.
Even outlawing “user visible” implicit copies and moves isn't enough, because that only requires the compiler to present a temporary copy at the same address each time the address is taken, but again doesn't prevent the compiler from treating that memory as initialized after I've explicitly moved from it.
“User visible” somewhat begs the question because it depends what you're allowed to observe and what you're allowed to assume it means. Is converting &x to an UnsafePointer “observing the address of x?” I don't think we've established that.

I'm worried that we have a different definition of implicit copies. This isn't relevant for correctness, but I see two explicit copies of the key-value pair in one statement from your code:
var (p, kv) = unsafeMoveMap(destroying: &self) { $0.tuple }
Unless I'm mistaken the .tuple getter returns a copy of the newly constructed tuple, and assignment into kv copies that tuple into an lvalue.

I sure hope that those are moves at worst; ideally they'd be eliminated by mandatory RVO. Regardless, if they are moves or even copies, I agree that they are explicit, i.e. visible in the source. I assume that's what you mean by source-level?

The "proper" way to acquire a stable pointer in Swift is the way that the language was designed to support from the outset, has simple well-defined rules, has been consistently communicated to developers and can be enforced by the compiler in most cases .

Sure. Neither of these hypothetical and unimplemented approaches was in the language design from the outset nor have they been communicated to developers. I don't see how one can be considered more “proper.”

I'm not defending the programming model; improvements to that are welcome. Let's just be clear that adding conditions for address stability complicates the language,

It complicates the technical definition of the language (which I'll note we don't have). It simplifies the mental model for users.

add allowing pointers to escape their scope under those specific conditions makes it significantly more difficult to diagnose undefined behavior. That has to be a worthwhile tradeoff.

OK, so nothing prevents pointers from escaping their guaranteed scope of validity today and being stored in globals, which I would think already demands a fairly complex dynamic system to diagnose UB without false positives. How would extending the scope of validity for some pointers make the job harder?

John_McCall · January 2, 2021, 8:06pm

withUnsafePointer doesn't just produce a pointer; it also bounds the time when that accesses can be made through that pointer. Making the code you've written work would in general require the implementation to be maximally conservative about memory analysis for any variable whose address might be taken and escaped as a pointer, which essentially means any variable with any abstract uses at all. That is essentially the world that C and C++ compilers have always lived in, but Swift aims to do better.

The fact that withUnsafePointer and other closure-based scoping APIs don't compose reasonably with features like yield is a problem that demands a general solution. We should not sacrifice basic goals of the system to work around it in the short term.

dabrahams · January 2, 2021, 8:23pm

Thanks, John. I understand the principles you're going for here and they make sense.

About the specifics, though, I have some questions. Can you clarify what you mean by “memory analysis?” When you say “the implementation” are you talking about the regular compiler or a special purpose UB detector? Also, what's an “abstract use?”

John_McCall · January 2, 2021, 9:27pm

Any optimization or analysis that tries to reason about what's currently stored in a memory location. For example:

  store %value to %location
  apply %functionWithUnknownSideEffects()
  %load = load %location     // Is this always the same as %value?

or

  %load1 = load %location
  apply %functionWithUnknownSideEffects()
  %load2 = load %location     // Is this always the same as %load1?

I primarily mean the compiler. We do not want the compiler to be forced to treat all abstract uses as "escapes" through which memory can be accessed later.

A use that isn't concrete. :) It's any use that we can't fully reason about. For example, if a variable is passed inout to a call, we can't in general know exactly how it'll be used there — maybe it'll be read, maybe it'll be assigned to, maybe it'll be completely ignored. That does block the optimizer some. But we do know that it won't be "escaped", such that we'd have to worry that arbitrary code long after the call might still be accessing the variable. In C and C++, you do have to worry about that — and since C++ relies heavily on abstraction, and it's common to pass things by reference (especially with this, but also just const & / && arguments), this can be a significant optimization problem for C++ code.

Andrew_Trick · January 3, 2021, 7:17am

Continuing the discussion from Law of Exclusivity/memory safety question:

dabrahams:

I'm worried that we have a different definition of implicit copies. This isn't relevant for correctness, but I see two explicit copies of the key-value pair in one statement from your code:
var (p, kv) = unsafeMoveMap(destroying: &self) { $0.tuple }
Unless I'm mistaken the .tuple getter returns a copy of the newly constructed tuple, and assignment into kv copies that tuple into an lvalue.
I sure hope that those are moves at worst; ideally they'd be eliminated by mandatory RVO. Regardless, if they are moves or even copies, I agree that they are explicit, i.e. visible in the source. I assume that's what you mean by source-level?

I was using "source-level copy" and "explicit copy" interchangeably.

Yes, I think it's reasonable to consider expressions source-level moves, not copies, as long the expression only produces and consumes rvalues, but I'm not the expert. In particular, passing an rvalue to an owned call argument could always considered be a move.

So, we can say there are no source-level copies here, only one move:

func bar(x: __owned AnyObject)

bar(foo()) // move

While there is one source-level copy here:

var x = foo() // move
bar(x)        // copy

And one source-level copy here:

func bar(x: __owned AnyObject) {
  bar(x)      // copy
}

I'm sure we'll also want some guaranteed level of copy elimination by the optimizer under certain conditions, which we haven't specified yet. Then we could rely on elimination of lvalue copies in the two cases above:

var x = foo() // move
bar(x)        // optimized move
// x never escapes or has its address taken

and

func bar(x: __owned AnyObject) {
  bar(x)      // optimized move
}

These optimizations should even be mandatory (-Onone), but it's a bit tricky because we currently preserve debug markers after the last use.

In addition to source-level moves and copies, the compiler is allowed to implicitly move or copy values. Those implicit move/copies may be user visible in at least two ways

address stability (via pointer operations)
CoW storage copies (performance)

We should eventually specify conditions under which those effects cannot be observed. For example, we might decide that inout argument's address is stable for the duration of the argument scope. That would then limit the compiler's ability to optimize inout argument passing via registers for non-ABI methods, as @jrose mentioned. The compiler would first need to prove that the inout argument's address is never observed.

As a counter-example of something we are not likely to restrict, taking the address of the same variable at different program points will not be guaranteed to produce the same address:

var t = ...
modifiesT(&t) // may observe an object at address A
modifiesT(&t) // may observe an object at address B

or

var t = ...
withUnsafePointer(to: &t) { ... } // may observe address A
withUnsafePointer(to: &t) { ... } // may observe address B

@jrose pointed out the exception for module-scoped variables. It's unlikely we would make an exception for generically types local variables. This is where special type restrictions or programmer annotations could be a useful tool.

What about CoW storage copies? I mentioned above that we could evertually guarantee no compiler-generated (implicit) copies and guarantee some forms of copy optimization. But this won't be something we do in general for arbitrary variables that have generic copyable types (the default unconstrained generic type). When a variable neither escapes, nor has its address taken, the compiler only models the values that the variable refers to--it immediately throws away the lvalue information:

var t: T = foo()
if (z) {
  t = b      // source-level copy
}
use(t)

If z is true, then t = b is a source-level copy. But if z is false, we still can't guarantee that t won't be copied. The compiler's representation looks more like this:

use(z ? foo() : b)

The compiler will eventually need to generate a temporary, requiring an implicit copy. The compiler has lost information about where the source-level copies were, so it is difficult to make an absolute guarantee about implicit copies of copyable types.

For non-copyable types, the compiler's internal representation will be able to guarantee no copies through all stages of compilation. As before, we could add other type restrictions or programmer annotations to either prevent or diagnose unwanted copies.

You are using coroutines as a mechanism to avoid CoW copies, running into the fact that they're only partially designed, working around that by a escaping pointer from the its well-defined scope, then deciding that you need some language guarantees to legitimize your horrible hack. The link between address stability and your original problem is tenuous.

There are two uncontroversial language features that directly solve your problems and have nothing to do with address stability:

generalized coroutines
the move operator

That said, I agree it would be helpful to specify the conditions under which implicit moves/copies cannot be observed, in addition to specifying the conditions for guaranteed copy optimization. I just don't know that it's an easier problem than adding the above language features. First, there's no going back from any restrictions we put in place. And it's not good enough to declare some rules for the compiler without a robust representation, underlying mechanisms, and verification to support those guarantees--no one knows everything that goes on in the compiler, and it changes all the time. We could declare the intention to follow some rules and gradually work toward that.

dabrahams · January 3, 2021, 8:28pm

Of course we agree on that! I guess I thought that bounding the valid lifetime of the pointee to the lifetime of the root inout access to it would be limited enough.

So that just means “a use the details of which aren't visible to the compiler at the use site?” I'm sure you've been using “abstract” that way since the beginning but I guess it never sank in for me

dabrahams · January 3, 2021, 8:59pm

Starting from the middle…

Whew! Unsparing, but I have to admit that's totally fair. Okay, I'll stop pursuing those guarantees and start thinking about how to integrate coroutines with the rest of the language; it's a problem I'm going to need to solve anyway…

Check.

For completeness: I guess that passing any value to a guaranteed call parameter is a copy iff the value is a [stored property of] a class instance or a global?

So, we can say there are no source-level copies here, only one move:
func bar(x: __owned AnyObject)

bar(foo()) // move
While there is one source-level copy here:
var x = foo() // move
bar(x)        // copy
And one source-level copy here:
func bar(x: __owned AnyObject) {
  bar(x)      // copy
}

I'd only expect those last two to copy if there's another use of x after the call to bar. I see you're treating that as an optimization, below.

I'm sure we'll also want some guaranteed level of copy elimination by the optimizer under certain conditions, which we haven't specified yet. Then we could rely on elimination of lvalue copies in the two cases above:
var x = foo() // move
bar(x)        // optimized move
// x never escapes or has its address taken

I'm pretty sure any use of x after the call to bar, regardless of whether x escapes, guarantees that there's a copy of x for the call to bar, no?

These optimizations should even be mandatory (-Onone)

Yesplease. Whether to think of them as optimizations or just as “how codegen works” as I did is admittedly above my pay grade, but it occurred to me that because of DI we already do analysis somewhat like this before the optimizer kicks in and it might make a big difference to what I think of as “real optimization” passes to have it sorted out.

but it's a bit tricky because we currently preserve debug markers after the last use.

Forgive my utter ignorance, but what's a debug marker?

Yep. Just so long as they're not the conditions I was asking for

That… was exactly what I was asking for I think?

As a counter-example of something we are not likely to restrict, taking the address of the same variable at different program points will not be guaranteed to produce the same address:

I… think I wasn't asking for that.

What about CoW storage copies? I mentioned above that we could evertually guarantee no compiler-
generated (implicit) copies and guarantee some forms of copy optimization. But this won't be something we do in general for arbitrary variables that have generic copyable types (the default unconstrained generic type). When a variable neither escapes, nor has its address taken, the compiler only models the values that the variable refers to--it immediately throws away the lvalue information:
var t: T = foo()
if (z) {
  t = b      // source-level copy
}
use(t)
If z is true, then t = b is a source-level copy.

I'd like that to depend on whether b might be accessed after the last line. There's not enough context to tell from this code, but I'll assume it's a global for the purposes of this discussion; that would force it to be a copy.

But if z is false, we still can't guarantee that t won't be copied.

Again, context is missing, so I can't tell, but it seems obvious that t will be copied if it is used after the last line and use takes its parameter as __owned.

Sorry, I must be missing something. Why can't it be a move?

The compiler has lost information about where the source-level copies were, so it is difficult to make an absolute guarantee about implicit copies of copyable types.

I don't think we should make guarantees in terms of a relationship to source-level copies; all we need to do is expose the rules by which we determine that a copy may be needed. I guess this is part of why I don't want to think of moving from the point of last use as an optimization.

Remember that I've let go of address stability in the general case; this means the compiler can implicitly move all it wants.

For non-copyable types, the compiler's internal representation will be able to guarantee no copies through all stages of compilation. As before, we could add other type restrictions or programmer annotations to either prevent or diagnose unwanted copies.

Sure, we could. The programmer will already have lots of explicit control: they can make the type move-only and add copy() method. IMO it would be a terrible shame if we can't guarantee minimal copying behaviors for ordinary copyable types like Array.

Well, I hope they solve my problems, and in the case of move, I'm in a position to make sure it does. I can't say I know enough about what “generalized coroutines” means yet to see that it definitely provides a good answer.

I agree it would be helpful to specify the conditions under which implicit moves/copies cannot be observed, in addition to specifying the conditions for guaranteed copy optimization. I just don't know that it's an easier problem than adding the above language features.

I think observability of move is much more of an expert-level concern than observability of copy, and while it's obviously going to be important to have a way to achieve address stability across a yield, I'm not nearly as worried about making that sort of thing ergonomic.

First, there's no going back from any restrictions we put in place. And it's not good enough to declare some rules for the compiler without a robust representation, underlying mechanisms, and verification to support those guarantees--no one knows everything that goes on in the compiler, and it changes all the time. We could declare the intention to follow some rules and gradually work toward that.

Super. I realize none of this stuff can be achieved in a day. Thanks for taking the time to discuss it.

John_McCall · January 4, 2021, 8:43am

The idea I've been noodling with for how to make these scoped operations compose better is to introduce a feature targeted for them, which I would call a using func, designed to work in tandem with a using statement at the use site.

So you could declare something like this:

extension Array {
  // This is a yield-once coroutine, like read/modify.
  // Given the opportunity, I think we'd use a slightly different ABI for it, though.
  mutating using func unsafeMutableBuffer() -> UnsafeMutableBufferPointer<Element> {
    yield ...
  }
}

and then use it like this:

extension Array {
  mutable func fidgetInPlace() {
    using buffer = self.unsafeMutableBuffer()

    // The coroutine call to unsafeMutableBuffer() doesn't complete
    // until `buffer` goes out of scope (which also means that `self`
    // is being accessed exclusively for all that time)
  }
}

Issues that come to mind:

It would make more sense for unsafeMutableBuffer to be a property. That would mean we'd have to have using vars and using subscripts, though. Would they have to be read-only? using statements as described above don't communicate whether the access is supposed to be a mutation or not.
It would be nice to use the results of using funcs directly in expressions. There's a natural "scope" to something like a call argument, where the using func would complete immediately after the call. Does that generalize acceptably? Is it too prone to creating dangling uses of the value in practice?
Should using statements force (or allow) the access to be explicitly scoped, like
```
using x = foo {
  body(x)
}
```
In the braceless approach, to explicitly scope the using you'd have to wrap it in a do {} block.

For what it's worth, the ownership manifesto suggests having some sort of endScope directive that can end the scope of any declaration. (In the manifesto, this is used to immediately end local "ephemeral" bindings like borrow and inout; using would be another, similar need.)

Michael_Gottesman · January 4, 2021, 7:14pm

Have you thought about using with? Consider:

with inout x = valueWithAccessor {
  ... do stuff ...
} // inout ends here.

Just reads nicely to me. I think about it like an if statement without a condition check just a bind.

Also, one thing that I do not understand is why in your example do you need to mark the function with using as well? In my mind with 'inout', 'borrow', 'let' and a using/with we would never need to mark the function since we could bind /any/ value, no?

dabrahams · January 4, 2021, 7:25pm

How is that not just a regular var/subscript?

seems like, syntactically,

let buffer: inout = self.unsafeMutableBuffer
...

Makes more sense as a way to keep an inout access "active" beyond a single expression.