[Pitch] Noncopyable (or "move-only") structs and enums

Gankra · December 12, 2022, 11:43pm

Drop running on scope exit is kinda just a sacrosanct truth in rust that exists as a concession to making programs easier to reason about. It's most important for unsafe code where you may have untracked raw pointers into a buffer and Really Don't Want That Buffer To Go Away Early. I'm honestly a bit surprised you're still looking at this, given you were previously finding issues with this approach with nice and safe ARC code.

The Rust team looked into what you're proposing a bit a couple years ago, calling it "eager drop", although I wasn't around for it, so I'm just linking things:

The conclusion was "no", of course.

(Currently typing up a followup comment that just braindumps a bunch of extra random facts about drops+borrows in Rust.)

Joe_Groff · December 12, 2022, 11:49pm

The Dream is still that there is a well-behaved-enough subset of types for which eager-drop semantics are actually safe and sound; the two biggest issues with ARC we tried to mitigate come up from the unknowable nature of dense reference-counted object graphs, and (as you noted) the interactions with unsafe constructs.

Michael_Gottesman · December 12, 2022, 11:57pm

Lexical lifetimes prevents the buffer from going away to early issue from happening. We are able to at the SIL level represent the notion of what is an escape and in such a case, just leave the lifetimes alone.

Gankra · December 13, 2022, 12:38am

Ok so there's a LOT of weird little fiddly things going on with drops and borrows in Rust. A completely random braindump:

There's a weird specific rule/guideline in rust that certain aspects of borrows/drops are purely syntactic. I can never remember the details but the gist of it is that you don't want improvements to the borrowchecker or lifetimes to change what code visibly executes. The borrow checker checks for errors and then goes away forever. Lifetimes Do Not Affect Codegen. Syntactic Drops Good.
I do have one salient article on Designing Deinitialization In Programming Lanuages. Starting in section 3.3 and 3.4 I discuss how, Rust and Swift both expose Definite Initialization to the end user and allow for a variable to have delayed initialization. In sufficiently dynamic situations this necessitates a flag on the stack to track the current initialization state of a variable (basically making it an implicit Optional but you get compile errors if you ever read when the Option is maybe None). This necessarily effects whether Drop/deinit runs.
Once you introduce noncopyable types you also get dynamic deinitialization based on whether a value was moved out or not. We messed around with the concept of "static drop" and decided it was too spooky (see section 4.5 of the article in the previous bullet, I'm being rate-limited on links...). The TL;DR is that if you move a variable out in an if, the Drop would get hoisted up from where the variable went out of scope to the (implicit) else block of that if. In this way it would have guaranteed that variable initialization status is always statically known. But again, spooky as hell.
What immediately jumps out to me as the kind of thing that would break with "eager drop" is "unwind guard" types which exist to emulate finally-blocks. These often are otherwise unused, and strictly exist to run when the scope ends. I can't remember exactly but I'm 100% certain that Swift, the language with every feature, has some kind of finally/defer so this is less of a problem for y'all.
Another thing that's really sketchy is people reasoning about stable addresses under moves. I know this is a thing Swift has largely told people they're not allowed to do (see withUnsafeMutablePointer) but hey I'm dumping Rust stuff, not Swift stuff. As an example, you might want to pass Box<T> and a *mut T into a function that points into it, and "know" that's fine because the Boxes contents are a stable address and won't get messed up by moves... except that we want to mark Box as noalias and so llvm might get the wrong idea and believe that the raw pointer can't point in there! It's a whole fucking thing and I hate it. Under eager drop this is definitely also a busted pattern, since the compiler has no idea one borrows the other.
The scoping of ~temporaries sometimes surprises people, although the failure-mode is generally only observed with types like Mutex where keeping a value alive for too long is a correctness error (causes a deadlock). In particular how long locks get held when the matched-upon expression involved a lock. This situation would be improved by trying to more aggressively drop the MutexGuard and release the lock. There's a clippy lint to try to catch this but last I checked it's too aggressive (complains about using Drain idiomatically and correctly because a for loop is basically a match and Drain has a destructor): Clippy Lints
In a similar vein, people get surprised by the fact that let _ = x; and let _y = x; have different destructor behaviour. The former isn't a variable binding, but rather a pattern that captures nothing. As such the temporary x goes out of scope and is dropped. The latter actually captures it in a normal variable (the prefix underscore tells the compiler it's fine that it's seemingly unused) and drops it when the variable goes out of scope.
In another similar vein, people get surprised by the fact that let (x, y) = z and let x_and_y = z has different drop behaviour (iirc). This is because of the other surprising fact that Rust drops fields in declaration order, while it drops variables in reverse-declaration order (arguably a bug, but it is simply truth now). In practice neither of the issues in this bullet or the previous one are problems because in 99.9% of the cases it would matter you just get compiler errors from the borrow checker or definite-initialization checker.
Actually as far as the compiler is concerned, the two cases in the previous bullet are dropped at the EXACT SAME time. Specifically if y borrowed x, the "lifetime" of y and x are "equal". In conjunction with the extremely-sketchy-and-long-storied dropck eyepatch, this allows a destructor to run while a type contains dangling pointers! Safely! Correctly! here's a demo with more details, but this is useful/important for supporting Arenas which end up being very intrusive and a mess for drop order.

Gankra · December 13, 2022, 12:44am

Oh also two things where I'm moreso talking out of my ass but know things are spooky:

There is some messiness in formalizing Rust around the fact that drop takes &mut self which is supposed to mean that self is valid for the entire body of the function, but drop is explicitly doing things that invalidated the value. I don't know if there's been a real resolution to this, last I checked we were in "try not to think about it". I think this issue might have been the one I recall reading?
There is also some messiness around Drop and Pinning, which is why there's several sections in the docs on Pin<T> dedicated to precise interactions with Drop. I think there's generally a desire to have some kind of notion of "async drop" but I'm just gonna be bluntly honest and say that Pin/async is stuff I simply don't understand properly and is clearly pushing up against semantic limits of the language.

Joe_Groff · December 13, 2022, 6:01pm

Thanks for the memory dump @Gankra!

In Rust, I recall that panics generally prevent &mut borrows from being temporarily invalidated, and it makes sense to me that that could also be a hazard for drop implementations, since you probably don't want every drop to have to do the reverse definite initialization thing and maintain a dynamic bitmap of the value state in case it panics and you have to destroy the currently-valid components as you unwind. We're initially taking the "self is inout in deinit" tack because we also don't want to hold up putting noncopyable types in developers' hands on implementing that partial invalidation analysis right away, but I think we want to in the fullness of time.

Alejandro_Martinez · December 13, 2022, 6:12pm

Just a comment on some code block in the document, isn't the second parameter missing an ownership keyword?

when a function parameter is declared with an noncopyable type, it must declare whether the parameter uses the borrow , consume , or inout convention:
func redirect(_ file: inout FileDescriptor, to otherFile: FileDescriptor) {

Joe_Groff · December 15, 2022, 11:30pm

I've incorporated some feedback from the discussion so far into the proposal; thanks everyone!

github.com/apple/swift-evolution

Revisions from first round of feedback

Commit by

jckarter in Propose noncopyable structs and enums

apple:main ← jckarter:noncopyable-structs-and-enums

- Add some missing annotations - Cover some additional alternatives and future d…irections: - More discussion about interaction with generic constraints, along with an example of why `NonCopyable` is not a positive constraint but `Copyable` is - Discussion of conditionally-copyable types as a future direction - More discussion about trying to consume fields. If we want adding/removing deinit to be an API-compatible change, this has to be restricted from outside the type definition.

I'd like to continue the design discussion, and I have a few particular open questions I'd like to hear more feedback on:

Should noncopyable values have scoped lifetimes, or "eager drop" lifetime that ends after their last use, if they are not consumed?
Should noncopyable types be able to add a deinit without breaking ABI and/or API? In what circumstances? The potential existence of a deinit on a type imposes some interesting constraints on how the value can be used. Since there needs to be a complete value to be consumed at any point a deinit can run, this generally means that client code shouldn't be able to consume any part of the value, since doing so would invalidate the value without going through deinit:
```
@noncopyable
struct Foo {}

@noncopyable
struct Bar { var x, y, z: Foo }

let bar = Bar()
let foo = bar.x // Error, not allowed to take `x` away from `bar`

@noncopyable
enum Bas { case x(Foo), y(Foo), z(Foo) }

let bas = Bas.x(Foo())
switch bas {
case .x(let foo): // ERROR: can't steal bas's payload, that might bypass deinit!
  ...
}
```
The restriction makes absolute sense for resource-managing types with meaningful deinits, but is inconvenient for types that really are intended to be simple aggregates, and it's particularly limiting for enums to not be able to pattern-match and consume their payloads. On the flip side, I think anyone who's worked in an object-oriented language with some kind of destructors in it has had reason to retroactively add a deinit to their classes at some point in their careers. So it stands to reason that some amount of resilience to adding deinits would be a good thing, but there are also types that will never have deinits which should allow for flexible destructuring of their members. Is an existing control like @frozen sufficient, or do we need finer-grained controls?
To reduce annotation burden, should we treat copyability like Sendable, and say that structs and enums are implicitly copyable when all of their members are, and they don't define a deinit (and they don't do other things we might add to the language in the future that require noncopyability)? That could significantly reduce how often developers need to explicitly tag noncopyable types with a @noncopyable attribute, Self: ?Copyable generic anti-constraint, or other annotation.

xwu · December 16, 2022, 12:20am

I think it'd be very interesting to see how far the Sendable analogy can be pushed here. If viable there's a really elegant consistency.

michelf · December 16, 2022, 1:17am

I think this is an orthogonal issue to it being non-copyable. I assume some non-copyable structs may prefer scoped lifetimes but others would prefer an eager drop.

Does this count the implicit deinit that could be present because one member has a deinit of its own? If yes, then it'll be very easy to have a deinit without noticing, and also to break the ABI/API inadvertently because of a change far away.

taylorswift · December 16, 2022, 1:24am

how would eager drop compose with async?

today it is a major pain point for me that we do not have deinit async and i hope one day we will be able to express something like deinit async and i feel like such a thing would not be compatible with eager drop.

michelf · December 16, 2022, 1:48am

I assume the same way it would compose if you were writing it explicitly with _ = consume x immediately after the last use of x. Why wouldn't it work with async?

I have the feeling I might be missing your point.

Joe_Groff · December 16, 2022, 3:29am

The restrictions wouldn't include the deinits of members themselves, since for a value type, the destruction of the value (without a deinit of its own) only involves the elementwise destruction of its constituent members, so it would still be fine in principle to partially consume an aggregate containing elements with deinits as long as the aggregate itself doesn't also have a deinit.

michelf · December 16, 2022, 2:41pm

It seems to me if a deinit needs to be added, the type could always be refactored to not break ABI/API by wrapping properties behind consuming accessors that forget self:

// old, leaky implementation
@noncopyable
struct FileHandle {
  var handle: Int
}
// refactored, fixed the leak
@noncopyable
struct FileHandle {
  private var _handle: Int
  var handle: Int {
    consuming get { let h = _handle; forget self; return h }
  }
  deinit { close(_handle) }
}

Unfortunately this trick will not work with enums.

John_McCall · December 16, 2022, 5:32pm

Partial consumption of a struct value is a kind of struct decomposition, which is inherently something that can only be done within the resilience domain of the struct. Adding deinit to a move-only struct is a structural change to the type’s ABI, i.e. it cannot be done to a frozen type without an ABI break. So we don’t actually have to worry about someone adding deinit to a type that we can partially consume out of in the first place.

Now, it is often true that features which present hard lines for ABI compatibility also present soft lines for source compatibility, and that’s true here. We should try to avoid the mistake we made with exhaustive enum switching where it’s too easy to rely on details of libraries you use. I think that means formalizing some concept of a “source resilience domain” that encompasses only your current module and the modules you’ve explicitly chosen to be revision-locked to (which can never include something you have an ABI boundary with).

Joe_Groff · December 16, 2022, 5:36pm

We can certainly make that call, but "adding deinit is not a resilient change" doesn't seem like it has to be fundamentally true—external code for a non-@frozen type would still call through the type's value witnesses to destroy it, and the destroy value witness can include deinit code.

John_McCall · December 16, 2022, 5:37pm

Sorry, I wrote this poorly and only realized it when reading it back. I’ve edited it now.

Joe_Groff · December 16, 2022, 5:42pm

Ah, what you wrote makes sense now. Practically speaking it makes sense that external code would only really have a hope of partially destructuring frozen values, anyway, since otherwise we'd need to provide some combination of consuming get accessors and/or a new destructure value witness method to allow external code to resiliently pick apart the value. So maybe that's the right call—nonfrozen types act as if they always might have a deinit, but frozen types have the presence or lack of a deinit baked in. It's a bit unfortunate that that puts us in a C++-like situation, where having an empty destructor unintuitively has API impact compared to having no declared destructor at all, but maybe that's the best design.

John_McCall · December 16, 2022, 6:08pm

Yeah, I guess I have that internalized from C++, but I can certainly see why it might be unintuitive. On the other hand, the alternatives seem even less intuitive: either we say that we always have to call a deinit function for types outside the resilience domain, even if they're frozen, or we say that the deinit of a frozen type is effectively implicitly inlinable, allowing callers to see that an empty deinit is equivalent to a missing one.

Gankra · December 16, 2022, 6:31pm

FWIW, in my experience whether a type has an explicit destructor or not is extremely fundamental to the type. I can't think of any counter-example I've seen in the wild (though I'm sure there is one).

The only situation I can think of is someone is relying on the destructor of members to do some work initially and then they realize that's too sloppy and implement in manually (an easy example would be a singly-linked stack with Box<Node>, which can blow the stack with recursive dtors if the LinkedList doesn't manually hoist the boxes out).

(That said all of Rust's defaults are tuned to make it very unlikely that such a change would be noticed, since Copy is opt-in and a type that is std::mem::needs_drop cannot opt-in to Copy ever. This is obviously not the case with Swift.)