SE-0366: Move Function + "Use After Move" Diagnostic

A late review; I've been taking a bit of an extended holiday and am still catching up on things.

+1 to having this operation in the language in some form.

I like having this as a keyword, I think. It's a fundamental operation. I would ask that it be extended to support multiple parameters:

var x = ...
var y = ...

move x, y

The result should be a tuple of all the values. Since it is a keyword rather than a function, I assume this would not require variadic generics (?).

It's also fine to add that later, but it's the kind of small quality-of-life thing that I doubt anybody's going to bother to write a separate proposal for.

I'd rather we accept multiple parameters (if we agree that it's a thing we want), and leave that part of the implementation until later if needed.

Naming Things

Whilst I acknowledge the ARC predictable performance thread, I wish there was a fresher, fuller manifesto about how we want to introduce ownership at the language level. Because, as I understand it, that's what the operations in that thread really amount to, and I think we would benefit from a more complete picture which allows us to use a consistent terminology.

I see two styles which I think would work - either one which evokes the idea of lifetimes, or one based on ownership.

To explain, I'll start with "copy". It's a highly overloaded term - even in Swift, we use the term "copy-on-write", which talks about a whole different kind of copying than the copy we're talking about avoiding here, or what NSCopying/NSMutableCopying mean by a copy.

I feel the ownership manifesto could be clearer about what it means, but I was wondering about it recently and figured a more obvious definition might be:

'copy' takes a value as input, and returns the same value but with an unconstrained lifetime. The caller has control over when it wants to end the lifetime of the returned value.

Or, as @Joe_Groff expressed in code:

func copy<T>(_ x: borrow T) -> T {
  return x
}

It's quite an amazing operation - to arbitrarily extend the lifetime of any value; to turn a borrow in to a non-borrow. We've just acquired an ownership stake over something we didn't own before, because we now have a coequal say over its lifetime.

Does this amount to a "copy"? For reference types, extending the lifetime/acquiring that ownership stake amounts to calling retain, and we have it for as long as we like - until we call the balancing release. I'm not sure that everybody would understand that operation as being a "copy", and I think it could use a less ambiguous name.

If we're going for lifetime terminology, extendLifetime seems fitting for this operation. If we're going for ownership, perhaps acquireOwnership? Given that this function is an opt-in for people who want this stuff to be explicit, I don't think an explicit name is too bad. Better than an ambiguous name.

Instead of @noImplicitCopy, I quite like @explicitLifetime/@explicitOwnership.

That brings us to move-only types - and I find it awkward because the interesting thing about a move-only type is not that it can be moved, but that it disallows copies. You kind of need to look at the negative space to understand the name and why it's an interesting concept.

If by "copy" we mean "obtain a version of the value which we own/whose lifetime we control", then what we have been calling a "move-only" type is describing data which Swift cannot share or duplicate the ownership of. It is uniquely owned - such that there can only be one variable which is known to control the lifetime of any "instance" of a struct, which is a concept arises once we have this new concept of uniqueness (but because it has unique ownership, we don't need reference counting for it).

And since we disallow sharing/duplicating ownership, the only other way you have to pass their contents around is by move (i.e. by ending the lifetime of the existing variable, hence transferring ownership), or with a borrow (a non-owning reference). So I don't think "move-only" is an exactly accurate description of these types; it's really "move-and-borrow-only". Alternatively, UniqueOwnership/UnshareableLifetime.

Fiiiiinally, we get to move itself. What this proposal is actually about. While copy is all about acquiring ownership, move allows us to relinquish ownership - to another variable, or not. They are counterparts - move is the release to copy's retain. That's why you had to read all of this, I'm afraid - I don't think we can address move until we address copy. We have to start there, so it can all be part of a consistent terminology (and I don't think the terms we have been using to date, or in the manifesto, necessarily give the best model for developers).

Anyway, I'm going to suggest endLifetime to mirror extendLifetime, and either giveOwnership or relinquishOwnership to mirror acquireOwnership.

Let's please not use the word 'transfer' for this. I still get flashbacks of __bridge_transfer.

In summary, the suggestions go something like this:

Ownership terminology:

  • copy = acquireOwnership
  • move = giveOwnership (heh, give or take? Depends on your perspective...)
  • @noImplicitCopy = @explicitOwnership (acquires must be explicit)
  • Copyable = ShareableOwnership
  • MoveOnly = UniqueOwnership

Lifetime terminology:

  • copy = extendLifetime
  • move = endLifetime
  • @noImplicitCopy = @explicitLifetime (extensions must be explicit)
  • Copyable = ShareableLifetime
  • MoveOnly = UnshareableLifetime
4 Likes

Just to paint the bike shed a bit, it would be aesthetically preferable for the parameter-decorating keyword to follow the precedent laid by the other ownership keywords and grammatically be a past participle verb-as-an-adjective describing what the function did to the passed-in parameter.

Thus:
moved (or taken since it seems we are leaning that way) to match with __owned and __shared (and the eventual borrow keyword should then be borrowed).

I was originally against the move-pretending-to-be-a-function as proposed in the original, but if it is implemented as an actual function using the moved (or taken) decorated parameter then Iโ€™m on board.

2 Likes

I anticipate that a common use case of move will be inside function arguments, so I think it should bind more tightly than , so that foo(move a, b) only moves a. You can of course write move a, move b to move multiple values.

3 Likes

Oh, that's a good point.

Anyway, I wanted to give two examples of code I've written recently that uses moves ("poor man's moves" - reassigning a value to a dummy object so I could transfer ownership), and how it actually shaped the design of those APIs.

It seems there is a bit of confusion about how this works with _modify and wrapper views and stuff, so maybe it'll help clear things up, or give people some ideas about what could be made possible with this feature.

API One: Write-through views with _modify

I love this pattern. The idea is that you have a storage object, and you can pass it between multiple "views", which are cheap ("zero-cost") wrapper structs providing specialised protocol conformances and other APIs. String is the original example in Swift, with the utf8, utf16, and unicodeScalars views, and since we gained _modify in the language (:neutral_face:) we can temporarily transfer ownership to these views, so they have a unique storage and can perform in-place modification. WebURL and FilePath are types which I know do this, and there may be others.

The basic idea is this:

// The COW storage type
class MyStorage { ... }

// Two views of that same storage object.
// Implemented like any other Swift COW type,
// using isUniquelyReferenced(&storage).
struct RegularView {
    var storage: MyStorage

    mutating func foo() {
      if !isUniquelyReferenced(&storage) { ... }
    }
}

struct SpecialView {
    var storage: MyStorage

    mutating func bar() {
      if !isUniquelyReferenced(&storage) { ... }
    }
}

Now the magic happens:

extension RegularView {

    var specialView: SpecialView {
        get { SpecialView(storage: storage) }
        _modify {
            // 1. Transfer ownership of the storage to the new view.
            var wrapper = SpecialView(storage: move(self).storage)
            // 2. On exit, transfer ownership back to us.
            defer { self = RegularView(storage: wrapper.storage) }
            // 3. Let the caller use 'SpecialView', with unique ownership.
            yield &wrapper
        }
    }
}

// We can now have scoped, mutating operations, without COW overheads.
// This opens up a new universe of API design possibilities.

var value: RegularView = ...
value.specialView.bar()

But how can we transfer out of self? Doesn't that leave self uninitialised and unsafe?

Yes, but we know self cannot be accessed during that time, because of the Law of Exclusivity. There can only be one mutating operation on a struct at a time, and the access on _modify lasts for the duration of the yield. Hence, the language guarantees this is safe. Each of these views are still normal, safe, Swift types - you can grab a value from the getter and let it escape the function and nothing will be left dangling, etc.

Even if we referred to the same value again (captured it in a closure, or passed it as a parameter), the compiler would just make an owned copy first, and then our mutating operation would find that isUniquelyReferenced == false, and do the usual COW thing.

value.specialView.bar(value /* Copy made here */)

So in case it wasn't clear to everybody, that's how that works. It has been transformative to WebURL's API, allowing me to include a very comprehensive, expressive API, packed away in namespaces, that are also extremely efficient. I think it the way to write these kinds of data types in Swift.

Also, it isn't unique to modify, or getters or setters. You could get the same thing with an initializer.

extension SpecialView {
    init(_ other: RegularView) {
        self.storage = other.storage
    }
}

var uniqueSpecial = SpecialView(move(value))

So as for the discussion about ByteBufferView and the ReadableBytesView, I fail to see a problem. Both syntaxes are able to express the desired ownership semantics. If you want to transfer ownership in to the result of the getter, this will work:

var byteBuffer: ByteBuffer = ...

var view = move(byteBuffer).readableBytes

Copy-on-write is a dynamic (runtime) thing. As long as you ensure the reference in the view is unique - somehow - you can avoid COW. It's pretty flexible.

--

API Two: Buffer reuse

I recently released WebURL 0.4.0, and one of the features is "domain rendering" (as in, internet domain names). It has quite an interesting design that is relevant to this discussion.

I won't go in to the details, but the gist is that the parts of domain names have two forms - a raw, ASCII form, and a decoded, Unicode form. Calculating the Unicode form is expensive, and requires multiple allocations, but many algorithms can use fast-paths to avoid that or can more efficiently use an intermediate stage (an Array<Unicode.Scalar> rather than a String).

So the idea that I came up with is to pass a helper object inout to the callbacks. We create the buffers outside of the loop, then move them in to the helper object on each iteration, let the caller use the helper object, then transfer ownership back out again.

var scalarBuffer: [Unicode.Scalar] = []
var scalarBufferIsReserved = false
var utf8Buffer = ""
var utf8BufferIsReserved = false

processLabels: while /* ... */ {

  var renderLabel = DomainRendererLabel(
    /* ... */,
    scalarBuffer: scalarBuffer,
    scalarBufferIsReserved: scalarBufferIsReserved,
    utf8Buffer: utf8Buffer,
    utf8BufferIsReserved: utf8BufferIsReserved
  )

  // Set scalarBuffer to [] and utf8Buffer to "", so DomainRendererLabel has unique ownership of the buffers,
  // and can use them for decoding Punycoded labels. We take ownership back after 'processLabel',
  // so later iterations of the loop can reuse the allocation. This is like a poor man's "move".
  scalarBuffer = []
  utf8Buffer = ""

  renderer.processLabel(&renderLabel, isEnd: isEnd)

  swap(&renderLabel._scalarBuffer, &scalarBuffer)
  swap(&renderLabel._utf8Buffer, &utf8Buffer)
  if renderLabel._scalarBufferState != .unreserved { scalarBufferIsReserved = true }
  if renderLabel._utf8BufferState != .unreserved { utf8BufferIsReserved = true }
}

It turns out, this is very effective - the DomainRendererLabel has mutating get properties (hence the inout), so the unicode scalar array is allocated lazily from the singleton, its contents are computed lazily, and that allocation is reused across iterations of the loop.

You can even compose renderers, and because this is all state stored in the label, it works. The Array does not get reallocated (as long as the capacity is okay; we purposefully over-reserve a bit). Unfortunately, this isn't quite as reliable for String as it is for Array, and although it is only computed once and cached, it does reallocate each time it is repopulated, even after a reserveCapacity. Perhaps it's just the operation we're doing.

When designing this API, the alternative was a Collection conformance, where the Element would be a struct with computed properties for the ASCII/Unicode forms.

This not only has much simpler semantics (it's a function with callbacks. That's it), it's a lot, lot faster, and still really easy to use. In the following example, both the unicodeScalars and unicode properties are computed lazily, so this simple code actually takes advantage of a lot of fast-paths to avoid redundant work, and you can even embed it in another renderer with its own calculations and all those fast-paths still work when they can.

struct NoMath: WebURL.Domain.Renderer {
  var result = ""
  mutating func processLabel(_ label: inout Label, isEnd: Bool) {
    if label.isIDN == false || label.unicodeScalars.contains(where: \.properties.isMath) {
      result.insert(contentsOf: label.ascii, at: result.startIndex)
    } else {
      result.insert(contentsOf: label.unicode, at: result.startIndex)
    }
    if !isEnd { result.insert(".", at: result.startIndex) }
  }
}

let domain = WebURL.Domain("hello.xn--e28h.xn--6dh.com")!
domain.render(.uncheckedUnicodeString)
// "hello.๐Ÿ˜€.โŠˆ.com"
//           ^ OH NO - A MATH SYMBOL!

domain.render(NoMath())
// "hello.๐Ÿ˜€.xn--6dh.com"
//           ^^^^^^^

I agree. If the tuple case proves important, it would be a simple extension to allow move (x,y), and that seems quite readable. I doubt this will be worthwhile in practice, though.

This also falls out of variadic generics if func move() is chosen.

To be clear, so you mean a move(_:) thatโ€™s compiler Magic or an actual function thatโ€™s generic with a moved parameter? Based on your use of func I assume the latter but perhaps Iโ€™m confused.

Actual function. Reading between the lines I donโ€™t think John or Joe are keen on adopting that idea, but being a real function doesnโ€™t preclude let z, w = move(x, y) syntax from ever working. It just makes it dependent on variadic generics.

i tried to replicate your example on a recent nightly, but i could not get it to compile:

struct Base 
{
    struct View 
    {
        var storage:[Int]
    }

    var storage:[Int]

    var view:View 
    {
        _read 
        {
            yield .init(storage: self.storage) 
        }
        _modify 
        {
            var view:View = .init(storage: _move(self).storage)
            yield &view 
            self = .init(storage: view.storage)
        }
    }
}
error: 'self' used after being moved
        _modify 
        ^
note: move here
            var view:View = .init(storage: _move(self).storage)
                                           ^
note: use here
            yield &view 

is this just a gap in the current implementation?

1 Like

It is correct! This is one of the sharp edges when using _modify.

The yield can also be a return point - for example if you call a throwing method on the view, the access is aborted and the line which reinitialises self would not be invoked. You must use defer to guarantee self is reinitialised, and then it works.

2 Likes

learn something new every day :slight_smile:

off topic:

this feels like a footgun in the yield keyword. we should really be treating it as if it has an implicit try, the way we work with rethrows elsewhere in the language.

1 Like

[Slightly off topic]: The error messages you got with that example:

error: 'self' used after being moved
        _modify 
        ^
note: move here
            var view:View = .init(storage: _move(self).storage)
                                           ^
note: use here
            yield &view

...are just terrible.

1 Like

tbh, yield should require a try, just like with any normal rethrows coroutine:

        _modify rethrows
        {
            var view:View = .init(storage: _move(self).storage)
            try yield &view 
            self = .init(storage: view.storage)
        }

the error message could then just refer to the error-throwing path, which would be a lot easier to understand

1 Like

That is a great idea.

1 Like

Perhaps it's time to revisit the old thread Modify Accessors.

I have a feeling we're sailing a bit far away from the move topic.

3 Likes

sink, take, consume have a directionality to them that move, transfer do not have.

It might make sense not to use the term move in a world where we have moveonly types. Not sure about transfer.

I would like to suggest confer instead of move/take for both param and operator

1 Like

Thanks everyone for participating in this review! The review has concluded and SE-0366 has been returned for revision.

Holly Borla
Review Manager

3 Likes