SE-0390: @noncopyable structs and enums

I'd like to experiment with non copyable types with SQLite statements. Those are objects that should not have several users - for example if the database rows fetched from a statement are iterated by two consumers, none will get the rows it expects. I currently put advice against statement sharing in the documentation: move-only types look like a good fit.

But I can't start experimenting. The linked toolchain macOS #583 won't run with the following dialog (developer not verified):

/Library/Developer/Toolchains/swift-PR-63783-583.xctoolchain/usr/bin/swift package build

And GRDB version 6.8.0 won't build
with the latest available toolchain swift-DEVELOPMENT-SNAPSHOT-2023-02-23-a.xctoolchain aka org.swift.57202302231a:

$ /Library/Developer/Toolchains/swift-DEVELOPMENT-SNAPSHOT-2023-02-23-a.xctoolchain/usr/bin/swift build
...
error: compile command failed due to signal 6 (use -v to see invocation)
Failed to reconstruct type for $sScs12ContinuationVyxq__GD
Original type:
(struct_type decl=_Concurrency.(file).AsyncThrowingStream.Continuation
  (parent=bound_generic_struct_type decl=_Concurrency.(file).AsyncThrowingStream
    (generic_type_param_type depth=0 index=0 decl=_Concurrency.(file).AsyncThrowingStream.Element)
    (generic_type_param_type depth=0 index=1 decl=_Concurrency.(file).AsyncThrowingStream.Failure)))
...
4.	While evaluating request IRGenRequest(IR Generation for file "/Users/groue/Documents/git/groue/GRDB.swift/GRDB/ValueObservation/SharedValueObservation.swift")
5.	While emitting IR SIL function "@$s4GRDB22SharedValueObservationC6values15bufferingPolicyAA05AsynccD0VyxGScs12ContinuationV09BufferingG0Oyxs5Error_p__G_tFfA_".
 for expression at [/Users/groue/Documents/git/groue/GRDB.swift/GRDB/ValueObservation/SharedValueObservation.swift:372:90 - line:372:91] RangeText="."
...

I think the toolchain should show up under Privacy & Security system settings pane for you to "allow anyway" after that happens. The blunt hammer of xattr -rc swift*.xctoolchain might do the trick if that doesn't.

I'll ask to see if that's a known issue. Thanks for giving it a try!

1 Like

You gave my the energy to try again. Right-click + Open on the .xctoolchain file was enough for swift build to ask me if I want to open anyway, and... build #583 successfully builds GRDB 6.8.0 :tada: OK I can start toying now :-)

:sweat_smile: I was ready to deal with MaybeStatement as a temp replacement for optionals (kudos for the Working around the generics restrictions section :+1:), but I don't quite know how to deal with sequences! Iterators return optionals, so iterators can't produce noncopyable types. All right, be tough, Gwendal, you'll crawl your way :muscle:

It's tough indeed.

A simplified version of the proposal FileDescriptor won't build when embedded as a new file in my package:

import Foundation

@_moveOnly
struct FileDescriptor {
  private var fd: Int32

  init(fd: Int32) { self.fd = fd }

  func write(buffer: Data) {
    print(buffer)
  }

  deinit {
    Darwin.close(fd)
  }
}
$ /Library/Developer/Toolchains/swift-PR-63783-583.xctoolchain/usr/bin/swift build
Building for debugging...
error: compile command failed due to signal 6 (use -v to see invocation)
Begin Error in Function: '$s4GRDB14FileDescriptorVfD'
Owned function parameter without life ending uses!
Value: %0 = argument of bb0 : $FileDescriptor            // users: %1, %3

End Error in Function: '$s4GRDB14FileDescriptorVfD'
Found ownership error?!
triggering standard assertion failure routine
UNREACHABLE executed at /Users/ec2-user/jenkins/workspace/swift-PR-toolchain-macos/branch-main/swift/lib/SIL/Verifier/LinearLifetimeCheckerPrivate.h:211!
[...]
---
4.	While evaluating request ExecuteSILPipelineRequest(Run pipelines { Non-Diagnostic Mandatory Optimizations, Serialization, Rest of Onone } on SIL for GRDB)
5.	While running pass #46 SILFunctionTransform "OwnershipModelEliminator" on SILFunction "@$s4GRDB14FileDescriptorVfD".
 for 'deinit' (at /Users/groue/Documents/git/groue/GRDB.swift/GRDB/Core/FileDescriptor.swift:13:3)
6.	Found verification error when verifying before lowering ownership. Please re-run with -sil-verify-all to identify the actual pass that introduced the verification error.
7.	While verifying SIL function "@$s4GRDB14FileDescriptorVfD".
 for 'deinit' (at /Users/groue/Documents/git/groue/GRDB.swift/GRDB/Core/FileDescriptor.swift:13:3)
[...]

I'm not sure how I can help. The review ends on March 7, but I'm not sure the compiler team will be able to ship a toolchain able to deal with GRDB by this date. That's fine, I'll play later with the result of the proposal.

I'm not against exploring must-explicitly-consume types as a new language feature, but it's definitely out of scope for this proposal. Going beyond "Rust can't do it", I get the impression from conversations with Rust developers that Rust tried to do it, and rejected the results. Gankra had posted some helpful links about the history here in the pitch thread:

One of the points she makes is that ensuring something is explicitly consumed is a hard thing to prove through any level of abstraction or indirection. Even Optional<T> becomes difficult to work with, since you get pushed to explicitly consume the value even on paths you know it will always be nil. And that's in a language where static lifetime values are relatively pervasive; as you noted, for most Swift code, the norm will likely remain to work with copyable and shared-ownership objects, making static enforcement even harder to maintain. As soon as a value ends up owned by a class, an escaping closure, global variable, or any other shared-ownership location, then the only place it can be consumed is in that containing object's destructor, which could again run at any time in a context you don't control, defeating many of the proposed use cases.

4 Likes

The current design of Sequence will definitely need some rethinking in how it will work with noncopyable types, since it is currently designed in such a way that the Iterator basically always has to own a copy of the Sequence, which is of course impossible when you can't copy something. The Collection model is likely to work a little bit better with a noncopyable type, since an index is always passed alongside the original collection, allowing you to implement a subscript that _reads or _modifys elements of the type in-place.

There's a bug we found with codegen of deinits in types that are otherwise "trivial". One way to work around this is to add an extra field of class type that doesn't do anything:

@_moveOnly
struct FileDescriptor {
  private var fd: Int32
  private let _workaround: AnyObject? = nil
}

Note also that some of our optimization passes may still lead to invalid codegen with noncopyable types, so it's best to stick to -Onone for testing purposes.

2 Likes

All right. If it's of any use, GRDB does not use sequences. It uses "cursors" - Cursor is like IteratorProtocol, but its next method can throw. There are cursors of database rows, but also cursors of statements built from an SQL string ("SELECT ...; SELECT ...; etc").

Usage:

while let value = try cursor.next() {
  // use value
}

I'll be greatly interested in the adaptations to sequences and iterators for move-only types, because I'll need to apply them to cursors as well.

2 Likes

You might have to wait until some of the other features we're working on, like borrow bindings, also come into place to fully integrate a move-only cursor with the language. Without them, the only way to really produce a lifetime-dependent value would be with a higher-order function, like:

extension Statement {
  borrowing func withCursor(_ body: (borrowing Cursor) -> ()) { ... }
}

extension Cursor {
  borrowing func forEachRow(_ body: (borrowing Row) throws -> ()) rethrows { ... }
}
1 Like

It sounds like there's a fix on the way for this issue. Toolchains from 02-23 on should have a fix for the deinit codegen issue you ran into earlier as well.

1 Like

I'll like to protect the current ergonomics on the library, that GRDB users appear to be fond of. I'm reluctant to break existing user code without a good reason. Some of the best patterns of the library are seven years old: that's something to take care of.

I only aim at API compatibility, not ABI. Ideally, I could replace the Statement class with a noncopyable Statement struct with only very few users noticing the change, and perhaps even zero if they're already good citizens and don't share statements, as documented. The lib is already in a good shape for this goal: users usually do not see Statement instances. And if they do see those instances, they are actively encouraged not to store them.

A typical usage of SQLite statement is when users aim at the best performance:

// Very close to the raw SQLite speed.
// Users who aim at sheer performance are really happy they can do this.
// Ideally, this would still compile once Statement is a move-only type.
let statement = try db.cachedStatement(sql: "INSERT INTO player (name, score) VALUES (?, ?)")
for player in players {
    try statement.execute(arguments: [player.name, player.score)
}

Instead of:

// Convenient, hidden raw statement,
// but frequently Codable-based,
// and always full of string-based accesses to column values.
// Can't achieve light speed.
for player in players {
    try player.insert(db)
}

Acute readers will notice let statement = try db.cachedStatement(...) in the above snippet. Yeah, SQLite statement compilation takes time (parsing + query plan), so caching can be important. I assume that the move-only Statement.deinit will be able to move back the raw sqlite3_statement* pointer back in the cache, after use. I said "assume", but I meant that I need this to be possible.

to add another data point if it helps, swift-mongodb has a type Mongo.Batches that is very similar to @gwendal.roue ’s cursor as he describes it. Mongo.Batches is an ARC type because it holds a reference to a Mongo.Connection, and Mongo.Batches is able to release that reference early when the user exhausts the database cursor, instead of waiting for user code to exit the iteration scope.

try await session.run(command: query, against: "my-database")
{
    for try await _:[Element] in $0
    {
    }
    // should be able to re-use the cursor’s connection here ...
}
// ... instead of waiting until here.

i don’t see this use case as being blocked on noncopyable, because i can achieve this behavior with ARC. but if we had @noncopyable, the Mongo.Batches type could become stack-allocated and non-refcounted, since there is never any reason to escape it from the iteration scope, it is only an ARC type right now to prevent lingering references to its wrapped Mongo.Connection.

2 Likes

These are great examples! Thank you so much for sharing them!

And in fact making your Statement type a noncopyable struct would help people who have been misusing it to find and correct those potential bugs.

Another interesting point about let statement = db.cachedStatement(...) is how it ties into some related work we're eyeing about supporting constrained lifetimes. In this case, it would be nice to be able to guarantee that statement could never outlive the db that provided it. (In practice, I don't think this is critical for your use case since db objects are typically very long-lived. A noncopyable form would discourage people from storing statements, which is likely sufficient.)

Yeah, a key goal of noncopyable is to eliminate runtime ARC for cases where an object naturally has a limited lifecycle.

Tim

6 Likes

Yes, and:

(In practice, I don't think this is critical for your use case since db objects are typically very long-lived. A noncopyable form would discourage people from storing statements, which is likely sufficient.)

Correct! No one ever has reported a crash due to a use of the unowned reference to the database connection held by the current Statement class. I agree that it's better when the mere possibility of programming errors can be discarded, but I may also keep the code simple and stick to YAGNI. We have years of experience of relying on api design as a the poor-man's borrow-checker.

As someone that inhabits userland I'm always somewhat reluctant to weigh in on evolution proposals, but a couple of thoughts.

The only reason I actually read the proposal was I happened to skim the initial responses and I had a visceral allergic reaction to reading that ?Copyable was proposed to mean maybe Copyable. I therefore read the full proposal and the full thread here.

I would like to strongly encourage the future direction to settle on syntax other than ?Copyable. The ? operator already has two extremely important uses in Swift, obviously in terms of optionals and also the ternary operator. While ?Copyable would only make my eyes bleed as an experienced user, more importantly I feel it has the potential to add unnecessary confusion for new learners of Swift. It may be hard for experienced programmers to recall that for new programmers, Optionals take quite a bit of work to get your head around. Perhaps for many people who contribute to Swift Evolution, they never experienced this problem. It remains real in userland. Any additional use of the ? character in syntax for an another unrelated language feature seems like unnecessary lack of clarity.

My immediate reaction was that ~Copyable could be a great syntax for this, and I then read above that @michelf suggested a similar thing. I would describe this slightly differently from Michel. In my view, this syntax should be considered to indicate something is "agnostic" to whether a type is Copyable. I'm aware this is in the future directions part of the proposal, but it was the part that mattered to me where it didn't feel quite right yet.

In terms of the overall proposal, it seems a valuable addition. I'd have thought the most natural fit would be noncopyable struct... without the @ in the same way that we use final class..., (it's a struct but is more tightly constrained in this one way, just like a final class is a class but it is constrained in that it can't be subclassed). But I read the arguments above for why @noncopyable makes more sense to the authors than noncopyable, and I accept they seem well formed.

From a broader perspective, this is a rather unusual situation where since every type has been implicitly conforming to an unstated Copyable, we're slightly painted into a corner—we can't just use the expected standard Swift form which would be to simply leave this declaration out when the type doesn't conform to it. OK, fair enough. As a broader point, it would be good if someone knowledgable is casting an eye around to think about if there are any other analogous situations that may need to be addressed in future, so that a pattern is settled on that will work for both.

10 Likes

The opposite is true here: Rust has a Clone trait defining an explicit clone mechanism ((&Self) -> Self) and a Copy trait inheriting from it that enables implicit copying.

Hence, if you wanted to provide a function on your @noncopyable type that creates an explicit copy, clone() would probably be a good choice based on prior art. In fact, all of the 4 terms you suggested to me read like explicit rather than explicit copies, e.g. “recreating” the value from scratch rather than just copying bits around.

From what I've read, "copy" in Rust means to create a new instance from the bits of another instance, while "clone" means to create a new instance equivalent to another instance with no constraints on how it's done. Therefore, "clone" in Rust is equivalent to "copy" in Swift, and "copy" in Rust is equivalent to "bitwise copy" in Swift (even though Rust does not allow implicit clones).

That being said, I can see how that statement was confusing.

Rust is really the outlier in terminology here. The idea of copying a value is common to every programming language. In most languages, of course, it’s an implementation-level concept; it’s only surfaced in the source language in “systems-y” languages that try to give the programmer control over basic representations. But it’s still a core concept that you sometimes need to create independent copies of a value, and that operation isn’t always going to be as simple as copying bits. “Copy” in this sense has a very long history, much older than Rust or probably any language still in major use.

That Rust chooses to tie the implicitness of the copy to its bitwise-ness makes perfect sense for its design goals. And I understand how they ended up picking the names here, because the result of that choice is that they don’t have a built-in notion of non-bitwise copying at all, other than the default memberwise derivation of the clone operation. But I do think they’ve done the broader PL world a bit of a disservice, in that people who’ve started with Rust sometimes struggle to realize that “copy” elsewhere is not inherently constrained the way it is in Rust.

8 Likes

I'd like to discuss the semantics of forget self. It's essentially implied that writing forget self consumes the value of self, and then it does a memberwise destruction of self's stored properties, bypassing its deinit. From an ownership perspective, the forget statement would be modeled like the following function:

func forget<T>(_ t: consuming T) { .. do memberwise destruction .. }

Consumption of a var doesn't prevent you from reinitializing that var with a new value.
Since self is var bound in a number of contexts, like consuming, mutating, and even init functions of value types, should we allow reinitialization of self after a forget? Should we allow forget in an init?

One big use case I see for this reinitialization is allowing you to avoid invoking the deinit while in an init, which happens after self has been initialized within that init. For example, just like for a class today, you can invoke the deinit from the init in this example.

@noncopyable struct Data {
  var data: [Datum] = []
  init(_ handle: NetworkHandle) throws {
    while let raw = handle.next() {
      self.data.append(raw)
    }
    if handle.incompleteMessage {
      throw E.someError
    }
  }

  deinit {
    // Some code assuming we have a complete message.
    assert(isCompleteMessage(data))
  }
}

The reason is subtle: because after self been fully initialized (e.g., because data has a default value), if you throw an Error out of an init (or return nil in a failable init) you will trigger self's deinit. But if you hadn't yet fully initialized self's properties, then you'll get memberwise destruction of only the initialized fields in that failure-exit scenario.

There are a number of workarounds to the above that require rewriting the init, or the deinit. For example, you could write the data results into a local var and only initialize self.data once we have a complete message. This is one situation where forget can be useful in a non-consuming function, which is outside of what's proposed:

init(_ handle: NetworkHandle) throws {
  // ...
  if handle.incompleteMessage {
    forget self        // just add this to avoid the deinit!
    throw E.someError
  }
}

But, unlike a class, structs and enums has a var-bound self. That means you can reassign self in the init. So if instead of throwing an error, one just wants to default initialize self, you're allowed to do that:

init(_ handle: NetworkHandle) {
  while let raw = handle.next() {
    self.data.append(raw)
  }
  if handle.incompleteMessage {
    self = .init()
  }
}

The problem in this example is that overwriting self after it's been initialized is also going to invoke the deinit on the old value. Again, we can fix this by writing forget self before the reinitialization:

if handle.incompleteMessage {
  forget self
  self = .init()
}

Allowing a forget self before reinitialization can also be useful for a mutating method:

@noncopyable enum MessageBox {
  case empty
  case holding(Message)

  mutating func overwrite(_ newMessage: Message) {
    forget self  // don't trigger deinit when we're just replacing the message.
    self = .holding(newMessage)
  }

  deinit {
    assert(isEmpty(self), "message box held message during destruction!")
  }
}

So, it seems like forget self can be very useful for noncopyable types in mutating and init methods, in addition to consuming ones. And that reinitialization of self after forget is not terribly bizarre. One important thing to note is that methods like overwrite here do not consume self on all paths, which is what's currently proposed for consuming methods.

Thoughts?

3 Likes

I’m not sure I think forget is a good idea as a generally-available operator. The semantics are pretty subtle, and I can easily imagine someone thinking incorrectly that this was some closely-related operation, like an operator to end a scope early or an operator to leak a value completely.

It’s a necessary operation to have in the implementation of a type with a deinit, but I don’t think anyone else should be able to do it; it should be presented in a more restricted form, like maybe an attribute you can put on a consuming method (which you could call forget and make public if you really want).

2 Likes