SE-0390: @noncopyable structs and enums

kavon · March 1, 2023, 10:43pm

I'd like to discuss the semantics of forget self. It's essentially implied that writing forget self consumes the value of self, and then it does a memberwise destruction of self's stored properties, bypassing its deinit. From an ownership perspective, the forget statement would be modeled like the following function:

func forget<T>(_ t: consuming T) { .. do memberwise destruction .. }

Consumption of a var doesn't prevent you from reinitializing that var with a new value.
Since self is var bound in a number of contexts, like consuming, mutating, and even init functions of value types, should we allow reinitialization of self after a forget? Should we allow forget in an init?

One big use case I see for this reinitialization is allowing you to avoid invoking the deinit while in an init, which happens after self has been initialized within that init. For example, just like for a class today, you can invoke the deinit from the init in this example.

@noncopyable struct Data {
  var data: [Datum] = []
  init(_ handle: NetworkHandle) throws {
    while let raw = handle.next() {
      self.data.append(raw)
    }
    if handle.incompleteMessage {
      throw E.someError
    }
  }

  deinit {
    // Some code assuming we have a complete message.
    assert(isCompleteMessage(data))
  }
}

The reason is subtle: because after self been fully initialized (e.g., because data has a default value), if you throw an Error out of an init (or return nil in a failable init) you will trigger self's deinit. But if you hadn't yet fully initialized self's properties, then you'll get memberwise destruction of only the initialized fields in that failure-exit scenario.

There are a number of workarounds to the above that require rewriting the init, or the deinit. For example, you could write the data results into a local var and only initialize self.data once we have a complete message. This is one situation where forget can be useful in a non-consuming function, which is outside of what's proposed:

init(_ handle: NetworkHandle) throws {
  // ...
  if handle.incompleteMessage {
    forget self        // just add this to avoid the deinit!
    throw E.someError
  }
}

But, unlike a class, structs and enums has a var-bound self. That means you can reassign self in the init. So if instead of throwing an error, one just wants to default initialize self, you're allowed to do that:

init(_ handle: NetworkHandle) {
  while let raw = handle.next() {
    self.data.append(raw)
  }
  if handle.incompleteMessage {
    self = .init()
  }
}

The problem in this example is that overwriting self after it's been initialized is also going to invoke the deinit on the old value. Again, we can fix this by writing forget self before the reinitialization:

if handle.incompleteMessage {
  forget self
  self = .init()
}

Allowing a forget self before reinitialization can also be useful for a mutating method:

@noncopyable enum MessageBox {
  case empty
  case holding(Message)

  mutating func overwrite(_ newMessage: Message) {
    forget self  // don't trigger deinit when we're just replacing the message.
    self = .holding(newMessage)
  }

  deinit {
    assert(isEmpty(self), "message box held message during destruction!")
  }
}

So, it seems like forget self can be very useful for noncopyable types in mutating and init methods, in addition to consuming ones. And that reinitialization of self after forget is not terribly bizarre. One important thing to note is that methods like overwrite here do not consume self on all paths, which is what's currently proposed for consuming methods.

Thoughts?

John_McCall · March 1, 2023, 11:39pm

I’m not sure I think forget is a good idea as a generally-available operator. The semantics are pretty subtle, and I can easily imagine someone thinking incorrectly that this was some closely-related operation, like an operator to end a scope early or an operator to leak a value completely.

It’s a necessary operation to have in the implementation of a type with a deinit, but I don’t think anyone else should be able to do it; it should be presented in a more restricted form, like maybe an attribute you can put on a consuming method (which you could call forget and make public if you really want).

Joe_Groff · March 1, 2023, 11:48pm

Note that the proposal already restricts forget to consuming methods defined on the type in its defining module, so it's not generally available. I also personally think it's good to err on the side of keeping its use restricted. The use case Kavon outlines for initializers does seem interesting, though, since there is an interesting discontinuity in behavior between erroring out of an initializer when self is partially initialized, versus when it's fully initialized and the deinit would normally run. Allowing forget in an initializer would let you get back to the other behavior if that's what you need.

John_McCall · March 1, 2023, 11:50pm

Okay, so why give it such as a general name? This makes it a peer of try, consume and borrow.

Joe_Groff · March 1, 2023, 11:58pm

For want of a better alternative, mostly. I didn't like the idea of a declaration-level attribute, since that leads to a fairly subtle behavior change that seems easy to miss at the actual point
of forgetfulness, and an attribute wouldn't completely eliminate the need for a special case syntax, since you'd instead need a way to say "actually I do want to run the deinit in this case" in conditional situations. I'm welcome to other alternatives.

John_McCall · March 2, 2023, 12:22am

Hmm. I think it would eliminate the need for a special-case syntax — an attributed function would be unconditionally “destructuring”, and if that wasn’t good enough, you could just make one that’s unconditional and call it conditionally. That also encourages a code pattern where you extract out an unconditional destructuring operation that returns the components that clients need to do something with, which seems much easier for programmers to reason about than some kind of complex intertwining where we conditionally suppress deinit and then separately use some subset of the properties.

But if you just want an alternate spelling, I’d just go long and loud about it. Nothing about this operation makes me think that suppressDeinit(self) would be a hardship. It’s probably not going to be used more than, what, twice in a type definition at the worst?

Joe_Groff · March 2, 2023, 12:34am

Well, there are the destructuring cases, but then there are also cases where you're implementing an alternative teardown for the type, and particularly if the teardown sequence can fail with an error, you still might want to leave the default deinit armed in case you have to give up and throw during your attempted alternate teardown. If there is complex intertwining, I think there are benefits to keeping the complexity localized instead of making people factor their code in a way that might not be natural.

Recent reviews (particularly thinking of SE-0366 and the consume operator here) have had the community lead us away from using function syntax for non-function-like operators, so it seems appropriate to continue the precedent there, but I'd be fine with a longer name.

tbkka · March 2, 2023, 5:07pm

As Joe said, not all consuming methods are going to be destructuring. A database handle, for example, may want what are in essence just alternate deinits: One that flushes pending operations before closing and one that closes without flushing.

(On the other hand, there are good reasons to discourage consuming close operations and maybe that's something we should analyze more closely.)

John_McCall · March 2, 2023, 5:44pm

I'm not suggesting that all consuming methods should be destructuring, just that allowing them to be annotated as destructuring might be a better way of achieving the language goals. Your forget operator is perfectly isomorphic to defining a destructuring forget() method. The difference is just that the latter encourages better code patterns: to me, if you have a complex consuming operation that sometimes destructures the value and sometimes forwards it, it is much better to write that in terms of smaller, unambiguous consumptions, splitting the destructuring paths out into tight "critical sections", than to have the destructuring be flow-sensitive and expect readers to keep track of it. And if anything that is more true in cases like init where programmers may already be somewhat confused about when exactly the value is fully assembled.

ksluder · March 2, 2023, 5:53pm

Perhaps there should be a rule that a noncopyable struct (I refuse to type the preceding @) with a deinit must include an explicit deinit self on all codepaths that follow complete initialization. Then there’s no ambiguity about when deinit occurs:

noncopyable struct DatabaseHandle {
    var dbConn: DatabaseConnection
    init(hostname: String, port: Int, username: String, password: String) throws {
      guard let dbConn = DatabaseConnection.open(to: hostname, port: port)
      else {
        throw ConnectionError()
      }

      // self is fully initialized at this point
      guard dbConn.authenticate(as: username, password: password)
      else {
        deinit self
        throw AuthenticationError()
      }
    }

    deinit {
      dbConn.close()
    }
}

Joe_Groff · March 2, 2023, 6:07pm

This is the rule that the proposal already imposes if you use forget self anywhere in a consuming method, that deinit invocations must also become explicit (although it uses _ = consume self, since that already effectively means "run the deinit"). It might be reasonable to require it to be explicit in initializers as well.

I'm having trouble thinking of how one would factor the "attempt an alternative destruction, but use the default destructor if that fails" pattern under this rule. Under the proposal as written, this would be written as:

consuming func attemptAlternateDeinit() throws {
  do {
    try library_attempt_alternate_deinit(self.handle)
    // We no longer need to deinit if it succeeds
    forget self
  } catch {
    // Explicitly consume self using the default deinit if it fails
    _ = consume self
    throw error
  }
}

If the rule is that a method's otherwise non-consumed paths must either all end in deinit, or all end without deinit, then it seems like the best you can do is

@disableDefaultDeinit
consuming func attemptAlternateDeinit() throws {
  do {
    try library_attempt_alternate_deinit(self.handle)
    // OK to drop the value at this point, as indicated by the attribute
  } catch {
    // Explicitly consume self by calling another method that
    // defaults to doing so
    self.runTheDefaultDeinit()
    throw error
  }
}

consuming func runTheDefaultDeinit() {}

which doesn't strike me as an improvement:

you have to write a method that has no purpose other than to get back into "deinit runs by default again" mode (though, to be fair, we could introduce a special deinit self syntax for that);
the fact that the deinit doesn't run isn't locally evident at the point where it matters, after the successful call to library_attempt_alternate_deinit;
the developer might not remember to deal with the error-thrown case, since the easiest thing to write is:
```
@disableDefaultDeinit
consuming func attemptAlternateDeinit() throws {
  try library_attempt_alternate_deinit(self.handle)
}
```
which lets the value leak on the error case. The proposal tries to avoid this by requiring whether the value is forgotten or not to be explicit on all code paths. Since even defaulting back to deinit might not be appropriate if you're doing something that requires suppressing it—you might want to throw ownership back to the caller to let them try something else again, or you're trying to close(2) on an unknown system and leaking the fd is really the only thing you can do—requiring the author to make an explicit choice struck us as appropriate.

John_McCall · March 2, 2023, 6:28pm

Again, you can get your operator back by writing @disableDefaultDeinit consuming func forget() {}, so if you want to write this function using implicit deinit on some paths, you can do it:

consuming func attemptAlternateDeinit() throws {
  try library_attempt_alternate_deinit(self.handle)

  // If we got here, the call succeeded and we should suppress deinit.
  self.forget()
}

But I think it's better to not mix implicit and explicit deinit and just drill directly through the abstraction, which presumably looks like this:

@suppressDefaultDeinit consuming func attemptAlternateDeinit() throws {
  do {
    try library_attempt_alternate_deinit(self.handle)
  } catch {
    library_normal_deinit(self.handle)
    throw error
  }
}

You can of course leak the value, but I don't think the attribute is any more prone to that than the flow-senstive operator; you can certainly use the operator carelessly by just putting forget self at the top of the method.

ksluder · March 2, 2023, 6:37pm

I think a major issue with forget self and _ = consume self is that they read as synonyms.

Joe_Groff:

consuming func attemptAlternateDeinit() throws {
  do {
    try library_attempt_alternate_deinit(self.handle)
    // We no longer need to deinit if it succeeds
    forget self
  } catch {
    // Explicitly consume self using the default deinit if it fails
    _ = consume self
    throw error
  }
}

What does the declaration of self.handle look like? Does this dilemma exist if the developer is explicit about the states their type may inhabit?

struct MyStruct {
    var handle: Handle!

    consuming func attemptAlternateDeinit() throws {
      guard let handle = consumeAndReplace(&handle, with: nil) else { preconditionFailure("handle already destroyed") }
      try library_attempt_alternate_deinit(handle)
      _ = consume self
    }

    deinit () {
      if let handle = handle.moveOut(replacingWith: nil) {
        // Try to destroy the handle a different way?
      }
    }
}

Joe_Groff · March 2, 2023, 7:04pm

ksluder:

What does the declaration of self.handle look like? Does this dilemma exist if the developer is explicit about the states their type may inhabit?

struct MyStruct {
    var handle: Handle!

    consuming func attemptAlternateDeinit() throws {
      guard let handle = consumeAndReplace(&handle, with: nil) else { preconditionFailure("handle already destroyed") }
      try library_attempt_alternate_deinit(handle)
      _ = consume self
    }

    deinit () {
      if let handle = handle.moveOut(replacingWith: nil) {
        // Try to destroy the handle a different way?
      }
    }
}

This creates a different dilemma, where handle being nil is now a potential state that the value can be in at any time, which introduces runtime overhead to check for nil and conceptual overhead for people working on the type who have to be mindful of that invalid state. That approach would indeed avoid the need for forget, but I think it's worth a bit more conceptual complexity to be able to reach the "make invalid states unrepresentable" goal we generally strive for in Swift.

ksluder · March 2, 2023, 7:08pm

This sounds like a different design goal that doesn't need to be conflated with the design of noncopyable types. Everything about deinitialization of partially-constructed noncopyable structs applies equally to classes.

In the meantime, developers can adopt noncopyable and model partial construction as complete construction, perhaps isolating partial construction to nested types such that their outer type is always either fully constructed or destructed.

Joe_Groff · March 2, 2023, 7:11pm

The fact that classes have shared reference-counted ownership means it's not even an option to attempt static decomposition of them (outside of the deinit itself). And I would say that "zero-runtime-overhead abstractions" are indeed a general goal of this feature, and that arguably includes the overhead of checking avoidable invalid states.

fclout · March 4, 2023, 3:54pm

I wanted to pitch in favour of something like that because I feel it avoids growing the “language surface”. Forget as an operator duplicates access control rules and it comes with slightly different consequences for consuming than every other consuming operations. I find that making it a method is easier to hold in my head.

One pattern I’d probably use a lot, at least privately, would be to have a forgetting method that returns the held resource and consumes self, like unique_ptr’s release(). I could do that with either the operator or an attribute that makes the method forgetting. However, in the operator case, this pattern would not get any of forget’s benefits. It’s just one data point, but being able to find realistic cases where there isn’t much of a difference says to me that a new operator might not pull its mental overhead.

fclout · March 4, 2023, 10:28pm

Other than this: completely in favor of non-copyable types, I think it's the right direction for Swift, etc. Only minor remarks are that given a prefix keyword/attribute to mark non-copyable types, I don't know what syntax we'll use for generic types that are copyable when their type arguments are. If the end result is a prefix keyword, I agree with other commenters that it's an important enough concept that it deserves to be outside of the @ syntactic namespace (which, until the introduction of result builders and property wrappers, I considered to be the namespace for keywords that had minor influence).

Joe_Groff · March 6, 2023, 5:38pm

Although the review period is winding down, one topic I'd like to hear the community's feedback on is the lifetime behavior for noncopyable variables that don't get consumed. For copyable types, we've been imprecise about how long values live for, since we want to reserve optimization opportunities for ourselves with ARC, and the nature of shared ownership already makes the precise end of objects' lifetimes hard to predict, since you never definitely know who the last owner of an object is. However, we don't have that issue with single-ownership, noncopyable values, so we can choose to be more precise. The proposal currently states that a local value gets destroyed after its last use if it isn't otherwise consumed, so in this example:

func borrow(_: borrowing Foo)

func use(x: consuming Foo) {
  print("a")
  borrow(x)
  // x's lifetime ends after `borrow` returns
  print("b")
}

the deinit if any for x would run before print("b"). Another option that might be more in line with developers' expectations might be to guarantee that x's lifetime covers its lexical scope (up to the point it's consumed), which would be in line with C++ and Rust.

Jon_Shier · March 6, 2023, 5:44pm

Can you speak to why Rust chose that behavior? Easier reasoning?