`discard self` for types with non-`BitwiseCopyable` members

When we initially introduced non-Copyable types, we gave them the ability to carry deinits, as well as the ability for consuming methods to clean up the value without going through the deinit using the special discard self form. Because of time constraints, we had originally restricted discard self only to types whose contents were completely BitwiseCopyable, but I would like to talk about removing that constraint, and allowing discard self to be used on any non-Copyable type with a deinit.

I've put a proposal up in the following pull request:

To start the discussion, I've pasted the initial draft of the proposal below:


discard self for types with non-BitwiseCopyable members

Introduction

This proposal extends the discard self special form to allow for its use in all non-Copyable types with user-defined deinits.

Motivation

When [SE-0390] introduced non-Copyable types, we gave these types the ability to declare a deinit that runs at the end of their lifetime, as well as the ability for consuming methods to use discard self to clean up a value without invoking the standard deinit logic:

struct Phial {
  var phialDescriptor: Int = 0
  
  deinit {
    print("automatically closed") }
  }

  consuming func close() {
    print("manually closed")

    discard self
  }
}

do {
  let p1 = Phial()
  // only prints "automatically closed"
}

do {
  let p2 = Phial()
  p2.close()
  // only prints "manually closed"
}

At the time, we restricted discard self to only support types whose members are all BitwiseCopyable in order to limit the scope of the initial proposal. However, discard self is also useful for types with non-BitwiseCopyable members.

Proposed solution

We propose allowing discard self to be used inside of consuming methods which are defined inside of the original type declaration of any non-Copyable type with a deinit. discard self immediately ends the lifetime of self, destroying any components of self that have not yet been consumed at the point of discard self's execution.

Detailed design

Remaining restrictions on discard self

This proposal preserves the other restrictions on the use of discard self:

  • discard self can only appear in methods on a non-Copyable type that has
    a deinit.
  • discard self can only be used in consuming methods declared in that type's
    original definition (not in any extensions).
  • If discard self is used on any code path within a method, then every code
    path must either use discard self or explicitly destroy the value normally
    using _ = consume self.

As noted in SE-0390, these restrictions ensure that discard self cannot be used to violate an API author's control over a type's cleanup behavior, and reduce the likelihood of oversights where implicit exits out of a method unintentionally implicitly invoke the default deinit again, so we think they should remain in place.

Partial consumption and discard self

A type with a deinit cannot typically be left in a partially consumed state. However, partial consumption of self is now allowed when the remainder of self is discard-ed. At the point discard self executes, any remaining parts of self which have not yet been consumed immediately get destroyed. This allows for a consuming method to process and transfer ownership of some components of self while leaving the rest to be cleaned up normally.

struct NC: ~Copyable {}

struct FooBar: ~Copyable {
  var foo: NC, bar: NC

  deinit { ... }

  consuming func takeFooOrBar(_ which: Bool) -> NC {
    var taken: NC
    if which {
      taken = self.foo
      discard self // destroys self.bar
    } else {
      taken = self.bar
      discard self // destroys self.foo
    }

    return taken
  }
}

Source compatibility

This proposal generalizes discard self without changing its behavior in places where it was already allowed, so should be fully compatible with existing code.

ABI compatibility

This proposal has no effect on ABI.

Implications on adoption

This proposal should make it easier to use noncopyable types as a resource management tool by composing them from existing noncopyable types, since consuming methods on the aggregate will now be able to release ownership of the noncopyable components in a controlled way.

Alternatives considered

discard only disables deinit but doesn't immediately

An alternative design of discard is possible, in which discard self by itself only disables the implicit deinit, but otherwise leaves the properties of self intact, allowing them to be used or consumed individually after the deinit has been disabled. This could allow for more compact code in branch-heavy cleanup code, where different branches want to use different parts of the value. For example, the FooBar.takeFooOrBar example from above could be written more compactly under this model:

struct FooBar: ~Copyable {
  var foo: NC, bar: NC

  deinit { ... }

  consuming func takeFooOrBar(_ which: Bool) -> NC {
    discard self // disable self's deinit

    // Since self now has no deinit, but its fields are still initialized,
    // we can ad-hoc return them below:
    if which {
      return self.foo    
    } else {
      return self.bar
    }
  }
}

However, we believe that it is more intuitive for discard to immediately end the life of self, even if this leads to more verbosity. Being able to place discard self at the beginning of a method and forgetting about it could also make code harder to read and make it easy to forget that the default deinit has been disabled when tracing through it.

discard recursively leaks properties without cleaning them up

In Rust, mem::forget completely forgets a value, bypassing not only the value's own drop() implementation (the equivalent of our deinit) but also the cleanup of any of its component fields. We could in theory define discard the same way, having it discard the value without even cleaning up its remaining properties. However, this would make it very easy to accidentally leak memory.

Recursively leaking fields also creates a backdoor for breaking the ordering of deinit cleanups with lifetime dependencies, which the original discard self design. Despite the prohibition on using discard self outside of a type's original definition, someone could wrap the type in their own type and use discard self to leak the value:

struct Foo: ~Copyable {
  deinit { ... }
}

struct Bar: ~Copyable {
  var foo: Foo

  consuming func leakFoo() {
    // If this discarded `self.foo` without running its deinit,
    // it would be as if we'd externally added a method to
    // `Foo` that discards self
    discard self
  }
}

Particularly with ~Escapable types, many safe interfaces rely on having a guarantee that the deinit for a lifetime-dependent type ends inside of the access in order to maintain their integrity. Having this capability could be necessary in some situations, but it should not be the default behavior, and if we add it in the future, it should be an unsafe operation.

Acknowledgments

Kavon Favardin wrote the initial implementation of discard self, and helped inform the design direction taken by this proposal.

12 Likes

If you can already write this as

_ = consume self.a
_ = consume self.b
// …

which I believe you can, then there’s ultimately no change in capability, and this is merely removing a restriction on syntax that has already been added to the language. I do think it’s reasonable to document it as such, so that people don’t mistake it for mem::forget.

(That said, the lesson of Rust’s Leakpocalypse was that mem::forget must be considered safe on any movable value because you can build it out of retain cycles. ~Escapable fortunately provides a bound on that, forcing you to infinite loop or block the task to avoid destruction, but even so, figuring out where the line of “unsafety” lies is tricky!)

1 Like

Thanks Joe. Three questions:

The SE-390 language was:

For the extent of this proposal, we also propose that discard self can only be applied in types whose components include no reference-counted, generic, or existential fields, nor do they include any types that transitively include any fields of those types or that have deinits defined of their own. (Such a type might be called "POD" or "trivial" following C++ terminology). We explore lifting this restriction as a future direction.

Explicitly, the proposal is about types that aren’t BitwiseCopyable. Does this effectively extend to all categories mentioned here (reference-counted types, non-copyable types, generics, existentials, and types that contain any of those)? Or are there still field-type reasons that discard self would be disallowed?

Do we specify the order in which non-consumed non-trivial fields are destroyed?

When a field is conditionally consumed, does using discard self later work with dynamic tracking, or does it diagnose? Modifying from your example:

struct NC: ~Copyable {}

struct FooBar: ~Copyable {
  var foo: NC, bar: NC

  deinit { ... }

  consuming func takeFooOrBar(_ which: Bool) -> NC {
    var taken: NC
    if which {
      taken = self.foo
    } else {
      taken = self.bar
    }
    discard self // dynamically tracks what to destroy, or diagnoses?

    return taken
  }
}
2 Likes

The intent here is to define discard self for any noncopyable type with a deinit, irrespective of its contents.

I'm not sure if we actually guarantee this in the general case. If we do, it seems reasonable to say that we follow the same order as if there were no deinit and the value was destroyed in its "natural" way.

The way partial consumption should work today is that we statically join the consumption paths when the control flow paths merge together. So the "then" branch of if which will implicitly destroy self.bar and the "else" branch will implicitly destroy self.foo to put self in a consistent state before reaching discard self.

1 Like

There's a benefit of this alternative that I think is worth mentioning: it would be closer to the behavior of deinits. The body of a deinit has access to the stored properties of a value, but suppresses the implicit call to the value's deinitializer. The behavior of deinit could be explained as: a deinit behaves as if it had an implicit discard self at the beginning. And the behavior of discard self could be explained as: after a discard self statement is executed, the method behaves as if it were a deinit.

It would have the wrinkle that with Mutation and consumption in non-`Copyable` type `deinit`s, we want to allow a deinit to transfer ownership of self in order to delegate its destruction. In order to do that, we need a way to "cancel" the implicit discard self in a deinit. But having an analogous feature for "cancelling" an explicit discard self within a method, such that ownership of self could be transferred after an explicit discard self, would add additional complexity and unintuitiveness.


With this pitch as-is, the behavior of deinit could be explained in another way: a deinit behaves as if it had an implicit discard self at the end[1] (unless ownership of self was transferred beforehand). That works, but it creates an inconsistency: while deinits implicitly call discard self as late as possible, we would likely encourage consuming methods to call discard self as early as possible in order to minimize control flow issues (for example, SE-0390's FileDescriptor.close() example).

This inconsistency would likely manifest in consuming methods following a specific pattern that sets them apart from deinits: first, read/consume any stored properties that will needed later (such as a file descriptor); second, call discard self; third, release the manually-managed resources owned by self (such as closing the file descriptor).

I suppose there's another way to solve said inconsistency while keeping the behavior of this pitch as-is: allowing people to write defer { discard self }. Then, a deinit would behave as if it had an implicit defer { discard self } at the beginning; and after a defer { discard self } is executed, a method behaves as if it were a deinit. This would allow people to write consuming methods that "act like" deinits without needing to follow the elaborate pattern described above.


  1. and before any early exits ↩︎

1 Like

There's another difference in behavior between discard self and the implicit cleanup in deinit (at least as currently proposed in that other thread). discard self ought to be mutually exclusive with a whole-value consuming operation on self, so that you can't discard self after consuming the whole value (or before, in the alternative model):

consuming func foo() {...}

consuming func bar() {
  self.foo()

  discard self // error: self has already been consumed
}

I think that makes sense, and as you noted, it encourages the "discard early" model where branchy logic (and particularly implicitly-branchy logic that arises from error handling) explicitly considers the cleanup behavior on all paths.

On the other hand, under the proposal as written, deinit allows for either partial or whole consumption of self. So you could have:

var part1: P1, part2: P2
deinit {
  if condition {
    self.part1.consume()
    // implicitly destroy self.part2
  } else {
    self.consume()
    // implicitly do nothing, self was given away
  }
}

Maybe it would be more consistent to make deinit use some sort of undiscard self marker prior to any whole-value consuming or mutating operation.

1 Like

I agree. In my opinion, it should always be unambiguous whether the current context is responsible for releasing manually-managed resources owned by self, or whether that responsibility lies somewhere else. If the developer gets it wrong, they will likely introduce a leak or double-free bug.

In the alternate model, the spelling undiscard self makes perfect sense: self is implicitly discarded at the beginning, and needs to be un-discarded. In the current model though, undiscard self is a little unintuitive because what it really means is "prevent self from being implicitly discarded in the future, because it will have already been consumed." Maybe it could be called nodiscard self, and/or nodiscard self could be written after self is consumed.

In the other thread, I suggested that the spelling should just be that consume self needs to be written explicitly. There could be another rule that, if consume self is written in any control flow path, then the implicit discard is suppressed in all control flow paths. Then the developer only needs to check if a deinit contains a consume self expression, in order to figure out whether or not it has the implicit-discarding behavior.

Another possibility is that we could use the spelling consuming deinit. It would intuitively mean that the deinit "acts like" a consuming method.[1] A downside is that it looks like it changes the external signature, but actually only changes the implementation. But there are other constructs that look like they change the external signature while only changing the implementation, such as isolated deinits and borrowing copyable parameters.


I think it would be nice if additionally, people could write discarding methods in the same "style" as they would write a deinit, and vice versa. But that brings into question what "styles" we really want to encourage. Let's say there are three "styles":

  1. Invalidate self as late as possible. This is what deinit would do by default (and already does), what defer { discard self } would do, and what a late discard self would do.
  2. Invalidate self as early as possible, and deinitialize its stored properties. This is what an early discard self would do in the current model.
  3. Invalidate self as early as possible, but preserve its stored properties. This is what an early discard self would do in the alternate model.

A benefit of (1) is that it's the most convenient and least strict. A downside is that it's potentially dangerous. It creates a period of time where self is still initialized, but is "dangling" because its manually-managed resources have been released. It becomes possible to accidentally use self while it's dangling, which could cause a use-after-free bug.[2]

A benefit of (2) is that it's intuitive. Another benefit is that it minimizes the risk of using self while it's dangling. A downside is that it often requires the elaborate pattern I described earlier. As a consequence, developers might just reach for (1) as the "natural" option, and incur its risks.

A downside of (3) is that it's unintuitive. A benefit is that, like (2), it minimizes the risk of using self while it's dangling. Another benefit is that doesn't require the elaborate pattern, so developers wouldn't be pushed to (1) as easily.


  1. With the exception that we should probably still prevent the deinit from implicitly calling itself. ↩︎

  2. I think in Swift this is hypothetical for now, but if a type has language-level memory safety or aliasing invariants, it's possible for a dangling value of that type to cause undefined behavior in very subtle ways. For example, in Rust, it's undefined behavior to copy or move a dangling reference or box, or a struct containing a dangling reference or box, even without dereferencing it. There's a MaybeDangling RFC for Rust intended to make working with dangling values less unsafe, which I think contains good examples of how dangling values with language-level invariants could cause subtle problems. ↩︎

discard self can only be used in consuming methods declared in that type's
original definition (not in any extensions).

Do extensions defined in the same source file as the type definition have special treatment?`

swift-system’s MachPort type has discard self functions in extensions, and I hope they’re actually intended to work.

2 Likes

It looks like the final SE-390 text does indeed say "same file".

2 Likes