SE-0390: noncopyable type `deinit`s, mutation, and accidental recursion

Joe_Groff · May 2, 2023, 10:47pm

We're looking into a potential usability issue with deinits on non-Copyable types. It's natural to expect a value to be mutable and consumable inside of deinit, so that parts of the value can be forwarded to other owners, or so that deinit can reuse existing methods on the value to perform its logic. However, inside of other mutating or consuming methods, it's easy to inadvertently trigger implicit destruction of the value and reenter deinit again:

struct Foo: ~Copyable {
  init() { ... }

  consuming func consumingHelper() {
    // If a consuming method does nothing else, it will run `deinit`
  }

  mutating func mutatingHelper() {
    // A mutating method may consume and reassign self, indirectly triggering
    // an implicit deinit
    consumingHelper()
    self = .init()
  }

  deinit {
    // mutatingHelper calls consumingHelper, which calls deinit again, leading to an infinite loop
    mutatingHelper() 
  }
}

Since this is an easy trap to fall into, it's worth considering whether there are any constraints we could impose on deinits to make it less likely to get into an infinite deinit loop situation like this. Some possibilities include:

We could say that the value remains immutable during deinit. Many types don't need to modify their internal state for cleanup, especially if they only store a pointer or handle to some resource. This seems overly restrictive for other kinds of types that have direct ownership of resources, though.
We could say that individual fields of the value inside of deinit are mutable and consumable, but that the value as a whole is not. This would allow for deinit to individually mutate and/or forward ownership of elements of the value, but not pass off the entire value to be mutated or consumed (and potentially re-deinited). This would allow for deinits to implement logic that modifies or consumes part of the value, but they wouldn't be allowed to use any methods of the type, other than maybe borrowing methods, to share implementation logic with other members of the type.
Since deinit must be declared as part of the original type declaration, any nongeneric methods that it can possibly call on the type must be defined in the same module as the deinit, so we could potentially do some local analysis of those methods. We could raise a warning or error if a method called from the deinit either visibly contains any implicit deinit calls itself, or cannot be analyzed because it's generic, from a protocol extension, etc.
We could do nothing and leave it in developers' hands to understand why deinit loops happen when they do.

We wanted to get some feedback from the community about what people think is the right approach here, and whether there are other possibilities we've overlooked. Thank you for reading, and for any thoughts you all are able to share!

Andrew_Trick · May 2, 2023, 11:09pm

Along those lines, the local analysis of consuming methods would allow the situation in which the deepest consuming method contains discard self. All other consuming methods would eventualy reenter the deinit. We could allow forwarding self through the return value, but we don't yet have a way of discarding that return value.

dnadoba · May 3, 2023, 8:43am

This kind of sounds like the current state of classes. They can still mutate properties but can't replace self in deinit. Classes do this by no allowing calls to mutating methods e.g.:

protocol Bar {
    init()
}

extension Bar {
    mutating func replaceSelf() {
        self = .init()
    }
}

final class Foo: Bar {
    init() {}
    deinit {
        self.replaceSelf() // error: Cannot use mutating member on immutable value: 'self' is immutable
    }
}

non-copyable types (and non-reference types for that matter) don't differentiate between mutating the value partially or completely replacing them with a new instance. I think if we want to support something like this across generic function calls we would need to start differentiating this through some special keyword e.g. replacing.

This is actually a useful tool outside of non-copyable deinits. I have run into this while trying to implement a wrapper which logs all access to a value. e.g.:

@dynamicMemberLookup
public struct ReadProxy<Value> {
    private let value: Value
    private var accessLog: ...
    
    public init(value: Value) { ... }
    public subscript<Property>(dynamicMember keyPath: KeyPath<Value, Property>) -> Property {
        mutating get {
            // log access to `keyPath`
            return value[keyPath: keyPath]
        }
    }
}

struct UserDefinedPoint {
    var x: Int
    var y: Int
}

func deriveValue(value: inout ReadProxy<UserDefinedPoint>) -> Int {
    // users can read value through mutating subscript getter
    let area = value.x * value.y
    // users can later also replace the instance with a new instance and
    // therefore erase the access log
    value = ReadProxy(value: UserDefinedPoint(x: 0, y: 0))
    
    return area
}

Because inout also allows replacing the whole value we can't be sure users don't accidentally overwrite the ReadProxy with a new instance. Today we need to resort to classes for this (without inout) but now have the risk that the ReadProxy outlives the call to deriveValue and have an extra allocation. non-copyable types have the same problems as copyable types today without the distinction between replacing the whole value and mutating the value partially.

filip-sakel · May 3, 2023, 9:11am

Would it make sense to delegate the responsibility of calling deinit to the caller instead of the callee (in reference to consuming functions)? This way, deinits would know not to call themselves again, and any other execution context would know to call deinit.

deinit {
  consumingMethod()
  // Doesn’t call deinit again
}

func myFunc() {
  MoveOnlyType().consumingMethod()
  // Automatically synthesizes deinit call…
}

This has the benefit of being flexible (allowing generics and calling code from different files/modules). However, I’m not sure if consuming methods are already baked into the ABI, and can, thus, not be changed (to avoid calling deinit by themselves).

michelf · May 3, 2023, 11:45am

I'd go with option 2: "individual fields of the value inside of deinit are mutable and consumable, but that the value as a whole is not."

And also consider adding mutating(fields) and inout(fields) modifiers to model that restriction for any function parameter. You then might call those functions and methods from a deinit. But there's more situations where such a modifier could be useful: it guaranties let members are going to stay the same, and if allowed on protocols it would also guaranty a mutating(field) method implementations in a protocol extension can't rebind an object reference, allowing calls to mutating(fields) methods implemented in a protocol extension to be called on a let of a class type (because let on a class type never prevents mutating the fields).

kavon · May 11, 2023, 8:41pm

I'm generally in favor of preventing reentrancy in some way. I view the main question here being how we want to do it.

While this post discusses some other trade-offs, I want to bring up one idea for completely statically checked and reentrancy safe code, where the rules about it aren't invisible. What I have in mind turns discard into an inter-procedural concept:

We can achieve complete, static reentrancy prevention by extending the type system with a new ownership kind called discarding (or whatever name you want for this). It has the same ABI as consuming and is otherwise a synonym of consuming, but with a few important differences:

When a discarding T value's lifetime ends, its deinit is not called. It's as though someone wrote discard X at that those lifetime ending points.
discarding T can only be used as the ownership specifier within functions that have full storage visibility into the type, i.e., it can be used for the same set of functions that support discard self today, but also within borrowing and mutating methods of the same type, etc.
Only consuming T --> discarding T conversions are permitted. You can then only convert discarding T into borrowing T.
- Corollary: a discarding method can only call borrowing or discarding methods.
Changing an ownership specifier between consuming and discarding preserves ABI, but in changing discarding to consuming, you'll see an API/source break.

Then, to fix the reentrancy problem, deinit is implicitly a discarding method. You don't need to know about that or write it, but once you run into a problem calling a consuming method, you're told about this new concept and can play around with it, or utilize it, in other scenarios:

struct S1: ~Copyable {
  private var stor: Int

  mutating func absorb(_ other: discarding S) {
      self.stor = other.stor
  }

  deinit {
    s1Global.absorb(self) // guaranteed to not reenter the deinit!
  }
}

struct S2: ~Copyable {
  private var stor: Int

  // without 'discarding' ownership, we can try to workaround it
  // with this consuming method:
  consuming func giveAway(_ other: inout S2) {
    other.stor = self.stor
    
    if something {
      // can't statically guarantee nonreentrancy due to
      // limitations of static control-flow analysis, so this
      // call either is banned because `giveAway` was used in 
      // the deinit, or we have to emit a warning.
      someNonAnalyzableConsumingFunction(self)
    } else {
      discard self
    }
  }

  deinit {
    giveAway(&s2Global)
  }
}

Generally speaking, I think the inter-procedural analysis to ensure looping doesn't happen is going to lead to odd and difficult-to-understand rules in the language about what's allowed in a consuming method. Having the existence of a certain kind of caller (a deinit) influencing what you're allowed to do inside the function is really hard to keep straight. By baking this into the language and type system formally, it becomes easier to understand why I couldn't call someNonAnalyzableConsumingFunction in absorb.

Joe_Groff · May 11, 2023, 9:01pm

Having some sort of static distinction could work, though introducing a whole new ownership kind for a distinction that disappears outside of the implementation feels a bit heavy to me. Since the property "doesn't ever call deinit" is interesting for both mutating and consuming methods, maybe it could be an attribute @noDeinit mutating / @noDeinit consuming?

And while this makes it harder to accidentally reenter deinit, it seems like there's still a fussy rule needed in order to deal with assignments, because if you allow:

  mutating func absorb(_ other: discarding S) {
      self.stor = other.stor
  }

that suggests that you would also allow:

  mutating func absorb(_ other: discarding S) {
      let stor = other.stor
      // oops, now stor gets destroyed here
  }

or, even in your example formulation, the deinit could do:

deinit {
  var s1Local = S1()
  s1.absorb(self)
}

and then end up reentering deinit again. It's not necessarily a strict local/nonlocal split because you don't necessarily know the lifetime of nonlocal variables you're assigning into. If you were trying to implement a deinit that does something like "if I'm not already in a certain global variable/on a certain executor, then move the value over there, and destroy it later when that global gets destroyed/that executor runs again", it seems to me like that is generally going to require some coordinated conditional logic in the deinit, to detect whether or not the value has ready-to-be-destroyed ownership already, and if it doesn't, to ensure that it moves only to that ready-to-be-destroyed owner, which would generally be hard to statically analyze.

kavon · May 12, 2023, 5:48pm

(I assumed you meant s1Local.absorb so I changed that)

So in this situation, self got consumed when it was passed as discarding when calling s1Local.absorb, so even though we reenter the deinit by returning from the call to absorb, you wouldn't have access to self anymore and the implicit discard at the end of the deinit doesn't happen. So I think it's fine to reenter the deinit by returning from this call.

I believe this destruction of stor should be OK too, because it's like a field got either copied or moved out of the instance we're in the process of destroying (which is what this first-class discarding value represents).

To help illustrate what I was imagining would happen in my example, here's an annotated version:

struct S1: ~Copyable {
  private var stor: Int

  mutating func absorb(_ other: discarding S) {
    let localStor = other.stor
    // destroy localStor
    // discard other
  }

  deinit {
    var s1Local = S1()
    s1Local.absorb(self)
    print("hi")
    /*
    s1Local.deinit()
    if self was not consumed {
      discard self
    }
    */
  }
}

This conditional test of to see if self was consumed in the deinit is a new dynamic one; some flag needs to be tracked in the deinit (and all functions taking a discarding value) to signal whether destruction has happened yet. Because this sort of "deinit delegation" mechanism means you can't statically determine who's responsible for implicit destruction of the struct:

deinit {
  if cond {
    self.close()   // a discarding method, so it it's guaranteed to destroy `self`
    // (set a local "self was discard" flag here)
  }
  /*
  if not discarded {
    discard self
  }
  */
}

Or maybe I missed the point you were making.

Absolutely. it would allow mutating methods to be invoked from deinits if we carried over an equivalent notion, maybe with an attribute. I believe all we'd need is to just say that overwriting self in a @noDeinit mutating func or overwriting the @noDeinit inout parameter would not trigger the value's deinit and just perform a discard.

Joe_Groff · May 12, 2023, 7:02pm

The issue is that destroying s1Local is equivalent to destroying self; when we destroy this s1Local, it will move itself into another s1Local and create a loop that way.

Joe_Groff · May 12, 2023, 7:48pm

Another way to potentially prevent recursive deinit would be to make it so that inout bindings for types with deinits don't allow for consumption and replacement, only incremental mutation, similar to &mut references in Rust. Then if self acts a mutating binding inside of a deinit, it would be possible to invoke mutating methods on it, but not consuming methods, and those mutating methods we call also would not be able to accidentally trigger recursive deinit. That would be unfortunate, though, since it would prevent elementwise consumption inside of deinit, and being able to leave values temporarily invalidated during an inout access is a nice ability we otherwise have over Rust.

kavon · May 12, 2023, 7:52pm

Ah I see. So I feel like that's a different situation. The argument to the deinit is a different instance of S1, the s1Local is a different S1 than self within each invocation of the deinit.

The kind of deinit looping I was thinking of preventing for noncopyable types is calling deinit again on the same value. That problem is prevented automatically for ref-counted objects because decrementing the ref count of something that already reached 0 doesn't trigger deinit on the object again. Only the first time you decrement from 1 to 0 to do you enter the deinit and mark a bit in the object to say it's been triggered (link).

The kind of infinitely recursive deinit I think you're getting at is something that's a problem for classes too:

class C {
  deinit {
    _ = C() // will overflow stack
  }
}
_ = C()

It might be worth considering if we can improve this situation too. We already have warnings about basic infinite self-recursion for functions, so we ought to warn about the obvious infinite recursion in the deinit above too (should be simple to adapt that logic). But in general we can't catch everything here since you can have all sorts of recursive patterns that are hard to analyze. For example:

class D {
  public var x: Int = 0
  deinit {
    if x == 0 {  // can't determine x in general
      _ = C()
    }
  }
}
class C {
  deinit {
    _ = D()
  }
}
_ = C() // boom

jrose · May 12, 2023, 7:55pm

This is not the case in Rust; std::mem::replace works on types with non-trivial Drop.

Joe_Groff · May 12, 2023, 8:02pm

Ah, that's fair. The fact that you have to call replace and can't just do let old = value; value = new does make it a bit more apparent that you're doing something Weird inside of a drop() implementation, maybe, and the potential for a deinit loop more visible.

Joe_Groff · May 12, 2023, 8:19pm

It's a bit of a philosophical question what constitutes a "different" instance. In a sense, we're really just moving the same instance around, though for move-only value types that's more or less equivalent to discard-ing the old value in the old place and constructing a new value with the same state in the new location.

That said, I also think that a not-entirely-airtight solution here is fine too. We want to make sure that it isn't easy to cause a deinit loop and not understand why.

Nickolas_Pohilets · May 12, 2023, 10:25pm

As a potential user of the feature, I’m perfectly fine with “do nothing” option. Maybe a warning if compiler can prove that there is an unconditional recursion. But I would not like to be forced to provide compiler with a proof that there is no unconditional recursion myself.

Noncopyable types in general is a somewhat expert feature, so I’m ready to approach them with extra alertness.

Functions that mutate self by assigning to it, or passing self as inout parameter are pretty exotic for copyable types. The assigned value is usually some kind of modified copy of self. But since with non copyable types it becomes harder to make a copy, I expect such functions to be even more rare.

Recursion in deinit has nothing to do with mutability, it happens when new instance is constructed inside the deinit. And this can be done with completely immutable self and individual fields.

bbrk24 · May 13, 2023, 3:35pm

Not necessarily, especially for numeric types like Int, wholesale reassignment of an inout/mutating value is far from unheard of.

ellie20 · May 31, 2023, 10:46pm

In my opinion, there are legitimate use cases for a deinit to call an arbitrary consuming function, which would be analogous to object resurrection in garbage-collected programming languages. For example, a noncopyable value could add itself to a queue on deinit to be processed later or periodically.

Maybe we should allow these types of patterns, but require the user to do so explicitly in order to avoid accidental infinite recursion. Perhaps in a deinit, we should require the consume operator to be used explicitly before moving self to any consuming function or method, and require any mutating method to be called with some explicit syntax, such as &self.

za_creature · November 23, 2023, 9:39pm

Some post-launch feedback:

I'm not familiar with the compiler internals so I apologize if this is a steep hill to climb, but to me it seems that most of this conversation is revolving around allowing self reassignments: I don't see any reason why deinit mutating fixed properties (i.e. without a setter, a.k.a. option 2) was ever in question. It's not even establishing a precedent, deinit does not contain nor permit the mutating modifier which would otherwise somewhat imply full access.

It feels like this should work:

struct Mutex: ~Copyable {
    private var handle = pthread_mutex_t()
    init() { pthread_mutex_init(&handle, nil) }
    deinit { pthread_mutex_destroy(&handle) } // Cannot pass immutable value as inout argument: 'self' is immutable
    mutating func callAsFunction<T>(_ continuation: () throws -> T) rethrows -> T {
        pthread_mutex_lock(&handle)
        defer { pthread_mutex_unlock(&handle) }
        return try continuation()
    }
}

Not that the fix is too difficult, just that ...Unsafe... proliferation probably goes against the overall long-term goals of the lang.

The fact that you can assign to self from a mutating func is honestly news to me after 2 years. I can see it for enums but not for a lot of other use cases (this may be failure of imagination on my part). I guess that I tend to distinguish between mutating (which I previously assumed was *struct) and inout (which I assumed was **struct).

bbrk24 · November 23, 2023, 9:42pm

They’re both *struct, but you can reassign through a pointer.

za_creature · November 23, 2023, 9:50pm

I guess you are referring to the distinction between binding and value? I figure it might be a tough nut to crack, just wanted to provide some user feedback in this regard.