[Pitch] Noncopyable (or "move-only") structs and enums

ksluder · December 16, 2022, 7:09pm

If the compiler always inserts a call to deinit for move-only resilient types, doesn’t that render the intuitiveness to the programmer moot? They never have to think twice about adding a deinit because the ABI defined away any consequences.

Philippe_Hausler · December 16, 2022, 7:14pm

How will key paths work with non copyable types? Won't the subscript<Member>(keyPath keyPath: KeyPath<Self, Member>) -> Member { get } be a problem with members that are marked as non copyable?

Joe_Groff · December 16, 2022, 7:17pm

They won't, at least initially, because noncopyable types will not be usable as type arguments to begin with. We would need to design what it means to generalize KeyPath's type parameters to allow for noncopyable types. It should be implementable, but will likely require using read/modify coroutines to project most properties without copying or consuming them.

Philippe_Hausler · December 16, 2022, 7:19pm

So to be clear, you are saying that developers won't be able to construct key paths to properties that are types that are non copyable?

Joe_Groff · December 16, 2022, 7:22pm

Yes. Even when we get to the point of supporting noncopyable types interacting with the generics system, existing APIs generally wouldn't "just work" with them without some effort to intentionally allow them, since all generic type arguments today have an implicit "copyable" requirement.

tbkka · December 16, 2022, 7:37pm

For a function that's not generic, the type of the parameter (whether it's @noncopyable) determines whether the function can copy that particular parameter. The borrow/inout/consume markers indicate whether the parameter is passed by value (consume) or by reference (borrow/inout).

That's why generics make this much more complex. Generics don't a priori know the concrete type of their argument, so need some kind of type constraint to indicate that they are willing to accept non-copyable types. With such a constraint, the compiler can verify that no copies are needed and that therefore the function is safe to use with either copyable or noncopyable types.

Tim

taylorswift · December 16, 2022, 8:31pm

i was imagining that when the value would be dropped it would be as if you wrote an await on calling the deinit like:

do
{
    let x:X = .init()
    
    doStuffHere()
    
    await x.deinit()

    doMoreStuffHere()
}

and you wouldn’t want doMoreStuffHere to get blocked on x.deinit. and i thought we could not kick this off to a structured background task like

do
{
    let x:X = .init()
    
    doStuffHere()
    
    async let deinit:Void = x.deinit()

    doMoreStuffHere()

    await deinit
}

because x might not be Sendable but then i remembered that move-only means Sendable (at least that is my impression of it) so i think this is not actually a problem now.

John_McCall · December 16, 2022, 8:38pm

The compiler will just make it work, of course. But people have an intuition for how attributes like @frozen affect code generation and optimization and therefore performance, and I think they would be surprised if marking a type @frozen was not enough to make it behave across modules roughly like a type from their own module.

ksluder · December 16, 2022, 8:44pm

Maybe @frozen move-only types should have a mandatory, if empty, deinit, the same way that public types must have a public init.

Michael_Gottesman · December 16, 2022, 9:46pm

I confirmed with JoeG offline that when he spoke about scoped lifetimes he meant strict C++ like lifetimes. I think there is one additional option here that is missing: lexical lifetime lifetimes. For those unfamiliar, a lexical lifetime model means that we in most cases have "eager drop" like semantics, but in cases where the value escapes we do not shrink the lifetime. We also appropriately do not shrink lifetimes past deinit boundaries. So it avoids the short falls of "eager drop" (hard to reason about behavior around unsafe inner pointers escaping) without adding complexity to the compiler/language causes by adding strict C++ lifetimes. If I remember correctly, when ARC was designed we purposely avoided the strict C++ scoped lifetime model since we wanted to avoid this complexity that has plagued clang. So to my mind this lexical lifetime model provides the benefits of eager drop around memory usage/performance, compiler simplicity while also avoiding the main pit fall off "eager drop" (hard to reason about behavior around escaping inner pointers). If we think that lexical lifetime semantics work well for copyable types to avoid these problems, why wouldn't we do the same thing with move only types?

Should noncopyable types be able to add a deinit without breaking ABI and/or API? In what circumstances? The potential existence of a deinit on a type imposes some interesting constraints on how the value can be used. Since there needs to be a complete value to be consumed at any point a deinit can run, this generally means that client code shouldn't be able to consume any part of the value, since doing so would invalidate the value without going through deinit:
@noncopyable
struct Foo {}

@noncopyable
struct Bar { var x, y, z: Foo }

let bar = Bar()
let foo = bar.x // Error, not allowed to take `x` away from `bar`

@noncopyable
enum Bas { case x(Foo), y(Foo), z(Foo) }

let bas = Bas.x(Foo())
switch bas {
case .x(let foo): // ERROR: can't steal bas's payload, that might bypass deinit!
  ...
}
The restriction makes absolute sense for resource-managing types with meaningful deinits, but is inconvenient for types that really are intended to be simple aggregates, and it's particularly limiting for enums to not be able to pattern-match and consume their payloads. On the flip side, I think anyone who's worked in an object-oriented language with some kind of destructors in it has had reason to retroactively add a deinit to their classes at some point in their careers. So it stands to reason that some amount of resilience to adding deinits would be a good thing, but there are also types that will never have deinits which should allow for flexible destructuring of their members. Is an existing control like @frozen sufficient, or do we need finer-grained controls?
To reduce annotation burden, should we treat copyability like Sendable, and say that structs and enums are implicitly copyable when all of their members are, and they don't define a deinit (and they don't do other things we might add to the language in the future that require noncopyability)? That could significantly reduce how often developers need to explicitly tag noncopyable types with a @noncopyable attribute, Self: ?Copyable generic anti-constraint, or other annotation.

Just to elaborate on this, I think this implicit copyability is really saying that if I have a noncopyable field then the containing type must be noncopyable and doesn't have to be annotated as such. I just find that to be a bit clearer. More of a side comment on verbiage.

ksluder · December 16, 2022, 9:54pm

The problem with this model is that init() can have side effects. This is especially true when the initializer is written in Objective-C. From the Swift compiler’s point of view, the object never escaped, but its initializer could have escaped a reference to the object.

What does this mean? That objects are always deinitialized in the same order, even if their lifetimes are shortened?

rex-remind · December 17, 2022, 12:23am

When using Rust (fwiw in production), I pretty quickly just elevate the type to Clone or wrap in an Rc/Arc if I really need to move around this restriction. I haven't personally found it to be any inconvenience. From your doc, it sounds like going from @noncopyable to implicit copy is not ABI or API breaking so that covers the Clone case I think. Since the equivalent of the lifting to [A]Rc case is class, I have a feeling users will make some Arc property wrapper for convenience, not entirely sure how such a change to source would affect ABI/API though.

say that structs and enums are implicitly copyable when all of their members are, and they don't define a deinit

^ I like this pattern.

On a separate note, somewhat confused by this:

A @noncopyable struct or enum may declare a deinit

Which I think also implies a Copyable struct or enum cannot declare a deinit.

However, an noncopyable type can be made copyable without breaking its ABI.

and

if frozen, then a deinit cannot be added or removed

If the type is frozen noncopyable with a deinit, and it becomes copyable by (2), doesn't it lose its deinit by (1) yet it cannot by (3)?

Slava_Pestov · December 17, 2022, 5:06am

This proposal comes with an admittedly severe restriction that noncopyable types cannot conform to protocols or be used at all as type arguments to generic functions or types, including common standard library types like Optional and Array .

It's also worth mentioning that noncopyable types cannot witness associated type requirements either.

glessard · December 17, 2022, 5:08pm

Doesn’t this imply keeping track of a form of definite deinitialization? Is the above is a future direction?

curt · December 17, 2022, 11:27pm

The dup2 examples are inscrutable unless you‘re someone who dabbles in Posix. I had to man dup2 and read two pages of background to understand what the examples were attempting to convey. Even then, this example doesn’t make sense except in a handwavey way:


// Redirect a file descriptor
// Require exclusive access to the FileDescriptor to replace it
func redirect(_ file: inout FileDescriptor, to otherFile: FileDescriptor) {
  dup2(otherFile.fd, file.fd)
}

dup2 mutates the meaning of the second argument such that it means the same thing as the first argument. I think the point of this example is to show how the designer of the redirect function would ensure that clients use it correctly. However, the prose implies that the implementor would need to take file inout in order to call dup2. I don’t think that’s the case, but then I’m already confused by the example.

Andrew_Trick · December 18, 2022, 12:01am

Yes. Whenever the compiler knows the fixed layout, we should allow partial consumes:

func swap(x: consuming (AnyObject, AnyObject)) -> (AnyObject, AnyObject) {
  return (consuming x.1, consuming x.0)
}

There's no reason not to do this. It's just data flow. The only caveat is that if the struct has a deinit, or if you haven't consumed all values, then you'll need to write forget, which will deinitialize all remaining members. Swift won't have an unsafe forget.

Andrew_Trick · December 18, 2022, 12:13am

[EDIT] This is still a strawman. If you have any counter-examples please share!

Summary

consuming values are eager-drop.

borrowing values are associated with a lexical scope.

Noncopyable values have strict lifetimes.

Copyable values have optimized lifetimes.

Strict Lifetimes

Strict eager-drop values give us a last-source-level-use rule.

Strict borrowing values give us a lexical lifetime rule.

Optimized Lifetimes

With optimized lifetimes, deinitialization is unordered.

An optimized consuming lifetime is "eager-drop", but is not a "last-source-level-use" rule. The optimizer must be able to freely substitute and delete copyable values. consuming value does not keep a value alive if it is copyable.

Optimized borrowing lifetimes follow our current default rules for let variables. An optimized borrowing lifetime is not a "lexical lifetime". The optimizer does track each variable's scope, but that scope is only restricted by deinitialization barriers.

Motivation

Generic lifetimes must be at least as strict as their specialized counterpart.

Copyable generic types cannot have strict lifetimes. ARC would not be optimizable if releases all had the semantics of a virtual C++ destructor defined in another module.

Noncopyable types don't require ARC optimization because retains and releases are only a materialization of copies.

Non-goal: An explicit borrowing or consuming modifier should dictate the ownership and lifetime rules independent of the type. While this is highly desirable, it directly conflicts with other goals.

consuming should eager-drop

We need a lightweight way to tell the compiler it can aggressively drop variables. Otherwise, the compiler is burdened by supporting certain patterns of weak references, unsafe pointers, and deinit side effects. That's a substantial burden for optimizing ownership because it's often impossible to prove that those patterns aren't present. These specific caveats aren't relevant to this thread. What's relevant is that sprinkling another layer of annotations around the code to control lifetimes, in addition to consuming and borrowing is a bad model.

This is all been shown by experiment.

Example:

struct Container {
  func append(other: consuming Container) {
    push(other) // Do not copy 'other' to keep it alive across 'push'
  }
}

borrowing should not eager-drop

First, here's why we can't eager-drop:

@noncopyable
struct FileWrapper {
  let handle: Handle

  [borrowing] func access() -> Data {
    handle.read() // self *cannot* be destroyed after evaluating 'handle', but before calling 'read'
  }
  consuming func close() {
    handle.close()
    forget self
  }
  deinit {
    close()
  }
}

Borrows need some relationship with their lexical scope.

We still have three viable options:
Option 1: optimized borrow lifetimes (just like `let)
Option 2: strict borrow lifetimes
Option 3: optimized copyable and strict noncopyable borrow lifetimes

TLDR; Only option #3 meets our goals.

The FileWrapper example above is currently safe if borrowing variables inherit optimized let lifetimes. The reason is that external calls to read or close a file handle are considered synchronization points. This does not mean that borrows need strict lexical lifetimes. We still have a choice. The question is whether we want to optimize copies or make struct-deinitialization immune to optimization.

The conundrum is this:

For copyable types, we need to optimize copies.
For noncopyable types, we want predictable deinitialization points
If borrowing is explicit, then lifetime behavior should (ideally) not depend on copyability

One of these needs to give.

let lifetime semantics allow optimization of copies, but do not specify "well-defined" deinitialization points. By today's rules, we will optimize the extra copy in this example:

struct Value {
  static var globalCount = 0

  borrowing func borrowMe() -> Value {
    let value = copy self
    globalCount += 1
    return value
  }

  deinit {
    globalCount = 0 // strange I know
  }
}

let value = Value().borrowMe()
//... 'Value.i' might be 0 or 1

Option 1: optimized borrow lifetimes

Ask users to use either call a consuming method, or use withFixedLifetime or deinitBarrier if they expect deinitialization to occur at a specific point.

To fix the unusual case of Value.globalCount above, the programmer would need to "synchronize" their deinitializer by adding a barrier to the method that accesses the Value:

struct Value {
  static var i = 0

  borrowing func borrowMe() -> Value {
    let value = copy self
    i += 1
    deinitBarrier() // withExtendedLifetime() also works
    return value
  }

  deinit {
    i = 0
  }
}

let value = Value().borrowMe()
//... 'Value.i' might be 0 or 1

Deinit barriers are required for class deinits regardless of how borrowing behaves. Any access to an external function or variable acts as a barrier. As does any concurrency primitive like await. We can trivially provide a deinitBarrier API in the standard library if it's useful. Or we can provide a "keep alive" keyword which behaves like consuming but doesn't actually consume.

This programming burden is, however, worse for struct deinits. Unlike with automatically managed objects, the programmer naturally anticipates the point at which struct deinitialization occurs. It's also an unnecessary burden because structs with deinits are noncopyable types. Optimizing their lifetime does not eliminate copies.

Option 2: strict borrow lifetimes

With this option, borrow becomes a "keep alive" keyword for any variable.

This option will significantly harm optimization as we migrate toward borrowing as standard practice. This is especially unwelcome because programmers are using borrowing precisely to optimize copies.

There will be quality of implementation issues. For example, we'll need to prevent the optimizer from deleting dead values and substituting equivalent values with different lifetime constraints. I suspect the compiler will never completely get this right.

Option 3: optimized copyable and strict noncopyable borrow lifetimes

With this option, copyable borrows can be optimized just like let variables today. Migrating to borrowing won't prevent ARC optimization.

With noncopyable borrows, there's no serious performance concern. Objects may be freed later then otherwise, but that matches expectations.

This requires optimizer support, but it mostly falls out naturally from our representation of noncopyable values. The problematic optimizations will largely be disabled for noncopyable values.

It might confuse programmers that borrowing is optimized more aggressively for copyable types. But it's natural that struct-deinits have somewhat special lifetime rules. And the optimization impact is only noticeable when unwanted copies are present.

Joe_Groff · December 19, 2022, 6:07pm

curt:

The dup2 examples are inscrutable unless you‘re someone who dabbles in Posix. I had to man dup2 and read two pages of background to understand what the examples were attempting to convey. Even then, this example doesn’t make sense except in a handwavey way:
// Redirect a file descriptor
// Require exclusive access to the FileDescriptor to replace it
func redirect(_ file: inout FileDescriptor, to otherFile: FileDescriptor) {
  dup2(otherFile.fd, file.fd)
}
dup2 mutates the meaning of the second argument such that it means the same thing as the first argument. I think the point of this example is to show how the designer of the redirect function would ensure that clients use it correctly. However, the prose implies that the implementor would need to take file inout in order to call dup2. I don’t think that’s the case, but then I’m already confused by the example.

I will revise the example. The intent is as you said, the author of this API is stating that callers to redirect need to have exclusive ownership of file in order to redirect it. (I'm sure POSIX allows you to dup2 over a file descriptor while other parts of the code are reading to or writing from it, and the API doesn't prevent it, but that doesn't mean it's a good idea.)

bbrk24 · December 20, 2022, 10:09pm

I’ve seen code like this in production before:

deinit {
    print("in deinit for \(self)")
}

In the case of non-copyable types I guess you couldn’t pass it to a string interpolation (at least not yet) since that’s compiled down to a generic function call. However, the idea still stands: simply logging when deinit happens, especially if used for debugging, should not change when or whether it happens.

michelf · December 20, 2022, 10:24pm

One question that came to mind: what about a non-copyable class, one where the reference is guarantied to always be unique, and therefore no reference counting needed? That seems like a logical thing to want too.

My guess is that often this would be better served by an indirect struct type.

On the other hand, subclassing could be supported, and, unlike a struct, mutating would work even when the reference to the object is a let, which is probably desirable in some cases. The semantics of self in a class would need to be altered to make it consumed, borrowed, and non-copyable.

I don't know if this is material for an alternative considered / future direction section.