SE-0366: Move Function + "Use After Move" Diagnostic

My point is that there’s a difference between this developer experience:

    myString.utf16.count

and this one:

    UTF16View(myString).count

… even though expression 1 boils down to the equivalent of expression 2 at runtime.

Swift currently allows the designer of String (and of UTF16View) to decide that clients who need access to the number of UTF-16 codepoints in a string get them via the first formulation.

@lukasa said:

…and my response is that this doesn’t entirely describe the problem. ReadableBytesView is the SwiftNIO analogue of UTF16View. Part of designing move semantics is figuring out how to give SwiftNIO the option of vending a ReadableBytesView via the existing getter syntax while avoiding the COW.

This will require some form of annotation on the property getter, and since I suspect there is a design for move semantics that is expressed entirely by annotations on function arguments and return values, I am trying to play that out in this thread before the language workgroup commits to a move expression.

4 Likes

As others noted, the main difference is that borrow makes explicit a shared borrow (which is for most normal types an immutable borrow), whereas inout indicates an exclusive borrow (a mutable borrow). I like the idea of keeping & to mean "this call may mutate this argument", and I'd also like borrow/take parameters to be usable as optimization tweaks without obligating a client-side code change.

4 Likes

Thanks. I can appreciate the logic in keeping & as a sigil for a mutable borrow. As the proposal mentions, and as the long digression above concludes, properties can’t be moved from without additional self-consuming semantics. But they can be borrowed from, which would make the applicability of & potentially confusing if it were also required to pass a move parameter.

There's a precedent of && for a similar thing in another labguage we can ... steal borrow :rofl:

1 Like

Aesthetics aside, there will be a need for a syntax to denote explicit copying, and there’s a limit to the number of & variants people will want to remember.

Unlike move and borrow, which place extra constraints on the typical compiler behavior, copy's behavior is captured pretty much by just a regular function:

func copy<T>(_ x: borrow T) -> T {
  return x
}
3 Likes

This touches on something I’m not quite sure about. You mentioned wanting to allow take (or move) to be be adopted by the callee without requiring a code change. It’s still an ABI change, though, right? If the argument is being populated from a variable, the caller has to move it out.

What’s a practical situation when a callee might do this? Other than as an implementation detail of move-as-function?

In typical callers, the compiler is free to copy or not as much as it needs to, so there's generally no noticeable difference, besides the net number of retains and releases, between calling a function with a borrowed parameter vs. a taken parameter. If you have:

class Foo {}
class Bar { var foo: Foo }

var global: Foo?

func callee(_ foo: Foo) {
  global = foo
}

func caller(_ bar: Bar) {
  callee(bar.foo)
}

then with the usual default receive-by-borrow convention, we'd end up with retain-release calls like this:

func callee(_ foo: Foo) {
  // Retain `foo` so that we can stash a copy of it in global
  _retain(foo)
  // (assignment dance elided for readability)
  global = foo
}

func caller(_ bar: Bar) {
  // Retain `bar.foo` around the call, in case another reference to `bar`
  // might mutate it during the call
  let foo = bar.foo
  _retain(foo)
  callee(foo)
  _release(foo)
}

Since callee always consumes its foo parameter by assigning it into global, it's an optimization to make it take the parameter, so that it doesn't need to retain inside the callee, and the caller doesn't have to release it after the call:

func callee(_ foo: take Foo) {
  // Don't need to retain `foo` since we're taking it
  global = foo
}

func caller(_ bar: Bar) {
  // Caller needs to retain a copy of foo for `callee` to consume
  // but doesn't need to release it after
  let foo = bar.foo
  _retain(foo)
  callee(foo)
}

so it is useful to be able to adjust the calling convention without requiring explicit callee- or caller-side annotations of the uses of the parameter or argument.

So are we effectively talking about a formalization of __owned, in which the caller implicitly generates a copy if necessary? Or are we talking about a new behavior where the value is moved out of the argument and the caller must explicitly copy the value?

As I said upthread, the latter definition is useful for spelling initializers of move-only types. But I get the sense you’re talking about the former?

In my mind, these are ultimately the same thing. For a move-only type, the parameter passing behavior at the call site isn't different. The lack of copyability ensures that the argument is borrowed or consumed directly. Explicit borrows or moves could still be useful as documentation or emphasis when working with move-only types, but wouldn't be necessary like they are for locally altering the behavior of normally copyable types.

Hm. The big difference in my mind is whether this code compiles:

func callee(_ arg: take Bar) {
  // ...
}

let bar = Bar()
callee(bar)
print(bar) // does this line compile?

If take is just a new spelling for __owned, this code compiles. If take behaves the way that I proposed move should behave, it doesn’t compile, because the lifetime of bar ended when the value was moved out of it and into arg.

Bringing this back to the proposal that we are ostensibly reviewing :smile:, if the latter is at all useful to have in the language, then I reiterate my stance that the move operator should be replaced with func move<T>(_: move T) -> T. And I think your example does a great job explaining how func copy() can be defined similarly.

If there isn’t a spot for something like move parameters, then I don’t have an argument against the move operator.

Either way, I believe assigning to _ should be required to drop a binding.

2 Likes

Agreed. Having the move operator be implicitly discardable seems a little too magical for me.

1 Like

I tracked the revision history of the proposal and was not convinced by changing move from function-like into such an operator. I would refer to this line in swift-evolution/commonly_proposed.md at main · apple/swift-evolution · GitHub :

not somePredicate() visually binds too loosely compared to !somePredicate() .

Similarly, move x visually binds too loosely compared to move(x), which could make confusion for its users. This is also not aligned with custom consuming functions, making move too specialized.

Personally I’m not so satisfied with the name move, but it seems somehow a term of art and there’s no better alternative. Would be happy if we can take a prefix operator here, which can deprecate the naming and make the binding clearer.

A late review; I've been taking a bit of an extended holiday and am still catching up on things.

+1 to having this operation in the language in some form.

I like having this as a keyword, I think. It's a fundamental operation. I would ask that it be extended to support multiple parameters:

var x = ...
var y = ...

move x, y

The result should be a tuple of all the values. Since it is a keyword rather than a function, I assume this would not require variadic generics (?).

It's also fine to add that later, but it's the kind of small quality-of-life thing that I doubt anybody's going to bother to write a separate proposal for.

I'd rather we accept multiple parameters (if we agree that it's a thing we want), and leave that part of the implementation until later if needed.

Naming Things

Whilst I acknowledge the ARC predictable performance thread, I wish there was a fresher, fuller manifesto about how we want to introduce ownership at the language level. Because, as I understand it, that's what the operations in that thread really amount to, and I think we would benefit from a more complete picture which allows us to use a consistent terminology.

I see two styles which I think would work - either one which evokes the idea of lifetimes, or one based on ownership.

To explain, I'll start with "copy". It's a highly overloaded term - even in Swift, we use the term "copy-on-write", which talks about a whole different kind of copying than the copy we're talking about avoiding here, or what NSCopying/NSMutableCopying mean by a copy.

I feel the ownership manifesto could be clearer about what it means, but I was wondering about it recently and figured a more obvious definition might be:

'copy' takes a value as input, and returns the same value but with an unconstrained lifetime. The caller has control over when it wants to end the lifetime of the returned value.

Or, as @Joe_Groff expressed in code:

func copy<T>(_ x: borrow T) -> T {
  return x
}

It's quite an amazing operation - to arbitrarily extend the lifetime of any value; to turn a borrow in to a non-borrow. We've just acquired an ownership stake over something we didn't own before, because we now have a coequal say over its lifetime.

Does this amount to a "copy"? For reference types, extending the lifetime/acquiring that ownership stake amounts to calling retain, and we have it for as long as we like - until we call the balancing release. I'm not sure that everybody would understand that operation as being a "copy", and I think it could use a less ambiguous name.

If we're going for lifetime terminology, extendLifetime seems fitting for this operation. If we're going for ownership, perhaps acquireOwnership? Given that this function is an opt-in for people who want this stuff to be explicit, I don't think an explicit name is too bad. Better than an ambiguous name.

Instead of @noImplicitCopy, I quite like @explicitLifetime/@explicitOwnership.

That brings us to move-only types - and I find it awkward because the interesting thing about a move-only type is not that it can be moved, but that it disallows copies. You kind of need to look at the negative space to understand the name and why it's an interesting concept.

If by "copy" we mean "obtain a version of the value which we own/whose lifetime we control", then what we have been calling a "move-only" type is describing data which Swift cannot share or duplicate the ownership of. It is uniquely owned - such that there can only be one variable which is known to control the lifetime of any "instance" of a struct, which is a concept arises once we have this new concept of uniqueness (but because it has unique ownership, we don't need reference counting for it).

And since we disallow sharing/duplicating ownership, the only other way you have to pass their contents around is by move (i.e. by ending the lifetime of the existing variable, hence transferring ownership), or with a borrow (a non-owning reference). So I don't think "move-only" is an exactly accurate description of these types; it's really "move-and-borrow-only". Alternatively, UniqueOwnership/UnshareableLifetime.

Fiiiiinally, we get to move itself. What this proposal is actually about. While copy is all about acquiring ownership, move allows us to relinquish ownership - to another variable, or not. They are counterparts - move is the release to copy's retain. That's why you had to read all of this, I'm afraid - I don't think we can address move until we address copy. We have to start there, so it can all be part of a consistent terminology (and I don't think the terms we have been using to date, or in the manifesto, necessarily give the best model for developers).

Anyway, I'm going to suggest endLifetime to mirror extendLifetime, and either giveOwnership or relinquishOwnership to mirror acquireOwnership.

Let's please not use the word 'transfer' for this. I still get flashbacks of __bridge_transfer.

In summary, the suggestions go something like this:

Ownership terminology:

  • copy = acquireOwnership
  • move = giveOwnership (heh, give or take? Depends on your perspective...)
  • @noImplicitCopy = @explicitOwnership (acquires must be explicit)
  • Copyable = ShareableOwnership
  • MoveOnly = UniqueOwnership

Lifetime terminology:

  • copy = extendLifetime
  • move = endLifetime
  • @noImplicitCopy = @explicitLifetime (extensions must be explicit)
  • Copyable = ShareableLifetime
  • MoveOnly = UnshareableLifetime
4 Likes

Just to paint the bike shed a bit, it would be aesthetically preferable for the parameter-decorating keyword to follow the precedent laid by the other ownership keywords and grammatically be a past participle verb-as-an-adjective describing what the function did to the passed-in parameter.

Thus:
moved (or taken since it seems we are leaning that way) to match with __owned and __shared (and the eventual borrow keyword should then be borrowed).

I was originally against the move-pretending-to-be-a-function as proposed in the original, but if it is implemented as an actual function using the moved (or taken) decorated parameter then I’m on board.

2 Likes

I anticipate that a common use case of move will be inside function arguments, so I think it should bind more tightly than , so that foo(move a, b) only moves a. You can of course write move a, move b to move multiple values.

3 Likes

Oh, that's a good point.

Anyway, I wanted to give two examples of code I've written recently that uses moves ("poor man's moves" - reassigning a value to a dummy object so I could transfer ownership), and how it actually shaped the design of those APIs.

It seems there is a bit of confusion about how this works with _modify and wrapper views and stuff, so maybe it'll help clear things up, or give people some ideas about what could be made possible with this feature.

API One: Write-through views with _modify

I love this pattern. The idea is that you have a storage object, and you can pass it between multiple "views", which are cheap ("zero-cost") wrapper structs providing specialised protocol conformances and other APIs. String is the original example in Swift, with the utf8, utf16, and unicodeScalars views, and since we gained _modify in the language (:neutral_face:) we can temporarily transfer ownership to these views, so they have a unique storage and can perform in-place modification. WebURL and FilePath are types which I know do this, and there may be others.

The basic idea is this:

// The COW storage type
class MyStorage { ... }

// Two views of that same storage object.
// Implemented like any other Swift COW type,
// using isUniquelyReferenced(&storage).
struct RegularView {
    var storage: MyStorage

    mutating func foo() {
      if !isUniquelyReferenced(&storage) { ... }
    }
}

struct SpecialView {
    var storage: MyStorage

    mutating func bar() {
      if !isUniquelyReferenced(&storage) { ... }
    }
}

Now the magic happens:

extension RegularView {

    var specialView: SpecialView {
        get { SpecialView(storage: storage) }
        _modify {
            // 1. Transfer ownership of the storage to the new view.
            var wrapper = SpecialView(storage: move(self).storage)
            // 2. On exit, transfer ownership back to us.
            defer { self = RegularView(storage: wrapper.storage) }
            // 3. Let the caller use 'SpecialView', with unique ownership.
            yield &wrapper
        }
    }
}

// We can now have scoped, mutating operations, without COW overheads.
// This opens up a new universe of API design possibilities.

var value: RegularView = ...
value.specialView.bar()

But how can we transfer out of self? Doesn't that leave self uninitialised and unsafe?

Yes, but we know self cannot be accessed during that time, because of the Law of Exclusivity. There can only be one mutating operation on a struct at a time, and the access on _modify lasts for the duration of the yield. Hence, the language guarantees this is safe. Each of these views are still normal, safe, Swift types - you can grab a value from the getter and let it escape the function and nothing will be left dangling, etc.

Even if we referred to the same value again (captured it in a closure, or passed it as a parameter), the compiler would just make an owned copy first, and then our mutating operation would find that isUniquelyReferenced == false, and do the usual COW thing.

value.specialView.bar(value /* Copy made here */)

So in case it wasn't clear to everybody, that's how that works. It has been transformative to WebURL's API, allowing me to include a very comprehensive, expressive API, packed away in namespaces, that are also extremely efficient. I think it the way to write these kinds of data types in Swift.

Also, it isn't unique to modify, or getters or setters. You could get the same thing with an initializer.

extension SpecialView {
    init(_ other: RegularView) {
        self.storage = other.storage
    }
}

var uniqueSpecial = SpecialView(move(value))

So as for the discussion about ByteBufferView and the ReadableBytesView, I fail to see a problem. Both syntaxes are able to express the desired ownership semantics. If you want to transfer ownership in to the result of the getter, this will work:

var byteBuffer: ByteBuffer = ...

var view = move(byteBuffer).readableBytes

Copy-on-write is a dynamic (runtime) thing. As long as you ensure the reference in the view is unique - somehow - you can avoid COW. It's pretty flexible.

--

API Two: Buffer reuse

I recently released WebURL 0.4.0, and one of the features is "domain rendering" (as in, internet domain names). It has quite an interesting design that is relevant to this discussion.

I won't go in to the details, but the gist is that the parts of domain names have two forms - a raw, ASCII form, and a decoded, Unicode form. Calculating the Unicode form is expensive, and requires multiple allocations, but many algorithms can use fast-paths to avoid that or can more efficiently use an intermediate stage (an Array<Unicode.Scalar> rather than a String).

So the idea that I came up with is to pass a helper object inout to the callbacks. We create the buffers outside of the loop, then move them in to the helper object on each iteration, let the caller use the helper object, then transfer ownership back out again.

var scalarBuffer: [Unicode.Scalar] = []
var scalarBufferIsReserved = false
var utf8Buffer = ""
var utf8BufferIsReserved = false

processLabels: while /* ... */ {

  var renderLabel = DomainRendererLabel(
    /* ... */,
    scalarBuffer: scalarBuffer,
    scalarBufferIsReserved: scalarBufferIsReserved,
    utf8Buffer: utf8Buffer,
    utf8BufferIsReserved: utf8BufferIsReserved
  )

  // Set scalarBuffer to [] and utf8Buffer to "", so DomainRendererLabel has unique ownership of the buffers,
  // and can use them for decoding Punycoded labels. We take ownership back after 'processLabel',
  // so later iterations of the loop can reuse the allocation. This is like a poor man's "move".
  scalarBuffer = []
  utf8Buffer = ""

  renderer.processLabel(&renderLabel, isEnd: isEnd)

  swap(&renderLabel._scalarBuffer, &scalarBuffer)
  swap(&renderLabel._utf8Buffer, &utf8Buffer)
  if renderLabel._scalarBufferState != .unreserved { scalarBufferIsReserved = true }
  if renderLabel._utf8BufferState != .unreserved { utf8BufferIsReserved = true }
}

It turns out, this is very effective - the DomainRendererLabel has mutating get properties (hence the inout), so the unicode scalar array is allocated lazily from the singleton, its contents are computed lazily, and that allocation is reused across iterations of the loop.

You can even compose renderers, and because this is all state stored in the label, it works. The Array does not get reallocated (as long as the capacity is okay; we purposefully over-reserve a bit). Unfortunately, this isn't quite as reliable for String as it is for Array, and although it is only computed once and cached, it does reallocate each time it is repopulated, even after a reserveCapacity. Perhaps it's just the operation we're doing.

When designing this API, the alternative was a Collection conformance, where the Element would be a struct with computed properties for the ASCII/Unicode forms.

This not only has much simpler semantics (it's a function with callbacks. That's it), it's a lot, lot faster, and still really easy to use. In the following example, both the unicodeScalars and unicode properties are computed lazily, so this simple code actually takes advantage of a lot of fast-paths to avoid redundant work, and you can even embed it in another renderer with its own calculations and all those fast-paths still work when they can.

struct NoMath: WebURL.Domain.Renderer {
  var result = ""
  mutating func processLabel(_ label: inout Label, isEnd: Bool) {
    if label.isIDN == false || label.unicodeScalars.contains(where: \.properties.isMath) {
      result.insert(contentsOf: label.ascii, at: result.startIndex)
    } else {
      result.insert(contentsOf: label.unicode, at: result.startIndex)
    }
    if !isEnd { result.insert(".", at: result.startIndex) }
  }
}

let domain = WebURL.Domain("hello.xn--e28h.xn--6dh.com")!
domain.render(.uncheckedUnicodeString)
// "hello.😀.⊈.com"
//           ^ OH NO - A MATH SYMBOL!

domain.render(NoMath())
// "hello.😀.xn--6dh.com"
//           ^^^^^^^

I agree. If the tuple case proves important, it would be a simple extension to allow move (x,y), and that seems quite readable. I doubt this will be worthwhile in practice, though.

This also falls out of variadic generics if func move() is chosen.

To be clear, so you mean a move(_:) that’s compiler Magic or an actual function that’s generic with a moved parameter? Based on your use of func I assume the latter but perhaps I’m confused.