Pitch: Genericizing over annotations like throws

TellowKrinkle · February 12, 2019, 3:07am

Introduction

Right now, you can make a function foo take a function that optionally throws, and only throw if the given function also throws, like this:

func foo(_ body: () throws -> ()) rethrows {}

However, you can not make a function take an object with a method that optionally throws and only throw if it throws:

// Not currently supported
func foo<T: OptionallyThrowingProtocol>(_ body: T) throws follows T {}

This pitch would allow you to do that

Motivation

Things like directories and files seem like perfect classes to conform to Sequence (a directory yields directory entries, a file is a UInt8 sequence) but if you actually try to write the conformance, you run into an issue: iterating over a directory or reading from a file can throw, and an IteratorProtocol's next method cannot.

Currently, the only solutions are:

Make your class do something other than throwing on errors
- Ending iteration early hides errors (not good)
- fatalError is also a bad idea
Change Sequence to require its iterators to throw
- Makes everyone try when iterating sequences, even if they're just iterating a Range
Make a second version of Sequence that allows its iterators to throw
- Lose out on all the stuff Sequence magically gets you

This would also be useful if you wanted to make a struct that represents a number inputted by a user, whose operators throw instead of fatalError-ing on overflow (but still conforms to Numeric).

In addition, while I think the need for this on throws isn't quite as high, two other function annotations (async and pure) that have been proposed recently would benefit even more from this being an option (especially pure)

Proposed Solution

Note: There are a lot of places where I'm not sure what the right balance in number of options is, so I'll present one option and add possible alternatives below. In addition the syntax is just there to demonstrate things you should be able to do, so if you can think of a better one please mention it.

All examples are shown with throws but should be extendable to async in the future. pure would have to work slightly differently since pure works the opposite of throws or async, a function throws if anything it relies on throws but a function is pure if all of the things it relies on are pure. impure would match how throws and async work but that would be horribly source breaking.

Allow protocols to declare themselves as optionally throwing

A protocol should be able to declare itself as containing optionally throwing methods.

protocol Sequence throws {}

(Maybe optionally throws would make it more obvious as to what this means?)

Allow protocols to define methods that throw optionally

An optionally-throwing protocol should be able to define optionally-throwing methods

protocol Sequence throws {
	// makeIterator can throw if the `Sequence` conformance is throwing
	func makeIterator() throws follows Self -> Iterator

	// underestimatedCount can never throw
	var underestimatedCount: Int { get }

	// throwyMethod always throws
	func throwyMethod() throws
}

Other options:

All methods on an optionally-throwing protocol can optionally throw (no annotation required)
There is no such thing as an always throws method, optionally throwing protocols can only have optionally throwing methods

Allow structs/classes to conform to the protocol

A struct should be able to express that it conforms to the optionally-throwing protocol either in a throwing or non-throwing way

// Conforms to Sequence and OtherProtocol with methods that don't throw
struct Array: Sequence, OtherProtocol {}

// Conforms to Sequence with methods that throw
// and OtherProtocol with methods that don't throw
struct File: Sequence throws, OtherProtocol {}

// Requires conformers to conform to Sequence, which they can
// choose to conform to throwingly or not
protocol MyProtocol: Sequence throws {}

// Requires conformers to conform to Sequence,
// only allows that conformance to be throwing if MyProtocol2 conformance
// is throwing
protocol MyProtocol2 throws: Sequence throws follows Self {}

For the two protocol examples, we may want to only allow the second variant (and make it just written protocol MyProtocol2 throws: Sequence throws {})

Express a type or function whose `throws` is conditional on another

A type or function should be able to express that whether or not it is the throwing variant depends on some set of other types

// randomFunction takes a sequence and throws if
// that sequence is a throwing variant
func randomFunction<S: Sequence throws>(_ seq: S) throws follows S {}
// Usage
randomFunction([2, 3])
try randomFunction(File(path: "test.txt"))
// The use of `Sequence throws` is to keep source compatibility
// for existing `Sequence` functions that don't expect their sequence
// methods to throw.  I don't especially like it, so if anyone else can 
// think of a better solution that would be great

protocol Sequence throws {
	// Iterator can (optionally) be the throwing variant if Self is the throwing variant
	// For `pure`, this would be non-optional
	associatedtype Iterator throws follows Self
}

extension Sequence {
	// Throws if `Self` is a throwing variant
	func randomMethod() throws follows Self {}
	// Throws if `Self.Iterator` is a throwing variant
	func otherRandomMethod() throws follows Iterator {}
	// Throws if `Self` is a throwing variant or if `Other`
	// is a throwing variant (for `pure` this would be "and")
	func combine<Other: Sequence throws>(with other: Other)
		throws follows Self, Other -> [Element] where Element == Other.element {}
}

// Zip2Sequence conforms to Sequence throwingly if either of its two base sequence throws
// but conforms non-throwingly otherwise (for `pure` this would be "both of")
struct Zip2Sequence<A: Sequence throws, B: Sequence throws>: Sequence throws follows A, B {
	struct Iterator: IteratorProtocol throws follows A.Iterator, B.Iterator {
		func next() throws follows A.Iterator, B.Iterator -> Element? {}
	}
}

// Not sure if this is something we want to implement,
// but a way to do this could also be nice:
protocol MyProtocol: A throws, B throws {
	// throws if the conformance to A throws, but doesn't care if the conformance to B throws
	func a() throws follows Self.A {}
	// I realize that you can't normally reference required conformances with `Self.A`,
	// maybe `Self as A` would be better?
}

I think(?) this covers all the ways you could use a conditionally throwing protocol
One possible extension would be to change the syntax of rethrows to be included within whatever syntax ends up being used for this.

Also, I realize that the above syntax gets confusing when you have multiple of these (especially the multiple protocol situation, at least throws and async are keywords):

struct Zip2Sequence<A: Sequence throws, B: Sequence throws>
	: Sequence throws follows A, B async follows A, B, OtherProtocol throws follows A, B async follows A, B

One solution would be to require parentheses when combining multiple:

struct Zip2Sequence<A: Sequence throws, B: Sequence throws>
	: Sequence (throws follows A, B) (async follows A, B), OtherProtocol (throws follows A, B) (async follows A, B)

Another would be to use a different character to combine multiple, for example &

struct Zip2Sequence<A: Sequence throws, B: Sequence throws>
	: Sequence throws follows A & B async follows A & B, OtherProtocol throws follows A & B async follows A & B

Another would be to use a different syntax altogether (please suggest one)

Usage

Using types that conformed to protocols non-throwingly would be the same as it is now:

let a = [1, 2, 3, 4, 5]
for x in a {}

Using optionally throwing methods on types that conformed to the protocol would require try as expected. A Sequence conformance would require the addition of a throwing for loop:

let dir = Directory(path: "/")
for try file in dir {}

Same goes for async

let website = RemoteFile(url: "https://swift.org")
for try await byte in website {}

A File could have a single

write<S: Sequence throws async>(contentsOf seq: S) throws async follows S where S.Element == UInt8

method that could be used to do any of these things:

let file = File(path: "/tmp/test.txt")
try file.write(contentsOf: [1, 2, 3, 4])
try file.write(contentsOf: File(path: "/tmp/test2.txt"))
try await file.write(contentsOf: RemoteFile(url: "https://swift.org"))

Effect on ABI Stability

While adding the ability to have throwing protocols wouldn't affect ABI stability, changing stdlib protocols to be optionally throwing almost definitely would

One possible way to get around this would be to have an annotation that caused the compiler to also generate a non-throwing version of the protocol runtime information along with dummy functions that forwarded to the optionally throwing variants.

Another way would be to create new protocols for the throwing versions and move functions over to them, though I feel like this would be kind of awkward to use.

Future direction

As mentioned above, as things like async and pure come into the language I would love to see this allowed for them too (being able to zip two async iterators without having to make a separate async version of zip, for example, is something that would be very nice to be able to do)

One other consideration is whether or not we'd want to allow something like this with mutating. Outside of replacing the need for MutableCollection, I can't think of any uses for this but maybe someone else has run into issues where they wished a protocol optionally allowed methods to be mutating.

anandabits · February 12, 2019, 3:29am

@Joe_Groff has suggested in the past something along the lines of an associatedeffect feature that would allow us to abstract over effects. It would look something like this:

protocol Foo {
    associatedeffect barEffects
    func bar() barEffects -> Int
}

This is just off the top of my head, but usage in generic functions might look something like this:

func caller<F: Foo>(f: F) async & F.barEffects -> Int {
    let x = await someAsyncCall()
    // this call probably needs some kind of usage site annotation along the lines of throws and await 
    // but it isn’t clear what syntax to use since the concrete effects are unknown
    return f.bar() + x
}

TellowKrinkle · February 12, 2019, 3:31am

That makes sense, that should cover all the things that I want

John_McCall · February 12, 2019, 5:12am

As a general matter, that seems reasonable. As a specific matter, while async is might logically be an effect like any other, I don't know how possible or sensible it's going to be to abstract over async-ness; it's just way too important to the implementation/semantic model.

TellowKrinkle · February 12, 2019, 8:34am

I don't know what the plan for how async will be implemented is, but here's how I would make a generically async function if async was some compiler magic that converted code to our current callback-passing system (pretty much, an optionally async function would either go async and return nil, or not go async and return its result plus (for supporting move only types) the unused callback):

// @once added for supporting move only types
typealias Callback = @once (Output) -> ()
// Assume each function's types could be different but I'm lazy

// (current) always async function
func asyncFunction(args: Args, callback: Callback) {
	let output = doStuff1(args)
	callback(output)
}

// (current) calling an async function from an async function
func outerAsyncFunction(args: Args, callback: Callback) {
	let calculatedStuff = doStuff2(args)
	asyncFunction(calculatedStuff) { partial in
		let output = doStuff3(partial)
		callback(output)
	}
}

// (new) optionally async function
func optionallyAsyncFunction(args: Args, callback: Callback) -> (Output, Callback)? {
	let (output, shouldAsync) = doStuff1(args)
	if shouldAsync {
		callback(output)
		return nil
	}
	return (output, callback)
}

// (new) calling an optionally async function from an async function
func outerAsyncFunction(args: Args, callback: Callback) {
	let calculatedStuff = doStuff2(args)
	let result = optionallyAsyncFunction(calculatedStuff) { partial in
		let output = doStuff3(partial)
		callback(output)
	}
	if let (partial, passedCallback) = result {
		passedCallback(partial)
	}
}

// (new) calling an optionally async function from an optionally async function
func outerOptionallyAsyncFunction(args: Args, callback: Callback) -> (Output, Callback)? {
	let calculatedStuff = doStuff2(args)
	let result = optionallyAsyncFunction(calculatedStuff) { partial in
		// If you async at least once, treat yourself like an async function from now on
		let output = doStuff3(partial)
		callback(output)
	}
	if let (partial, passedCallback) = result {
		// Make sure passedCallback is the same one you passed in, else fatalError
		// Take ownership of stack variables back from passedCallback
		let output = doStuff3(partial)
		return (output, callback)
	}
	else {
		return nil
	}
}

// (new) calling an optionally async function from a non-async function
func outerNonAsyncFunction(args: Args) -> Output {
	let calculatedStuff = doStuff2(args)
	guard let (partial, _) = optionallyAsyncFunction(calculatedStuff, callback: {}) else { 
		fatalError("optionallyAsyncFunction went async even though it shouldn't have!") 
	}
	let output = doStuff3(partial)
	return output
}

Obviously I don't know what we'll end up using as our async system, but I do hope that ability to abstract over async-ness is one of the considerations made when choosing.

Edit: As a side note, an optionally async function might also perform better on something like a buffered stream iterator that only actually goes async once every few thousand calls

John_McCall · February 12, 2019, 5:16pm

Okay, so basically two separate ABIs for async functions and potentially-async functions, and everything downstream of a potentially-async function call has to be emitted twice, once for if it returns normally and once in a callback. That is a lot of complexity. It also ends up having all the overhead of both conventions, since you still need to heap-allocate the frame whenever a potentially-async function makes a potentially-async call. This can be avoided if the function is statically known to never make such a call, but ordinary async functions can do that, too.

A much simpler way of doing this is to use a single ABI for async functions based around tail calls and callbacks. You can still efficiently implement calls to async-ABI functions that are statically known not to go async by just making a non-tail call and providing a callback that just stashes the return value on the stack.

In either implementation, a function which abstracts over async-ness is going to be significantly less efficient when dynamically non-async than a non-async function. I think that's unavoidable; anything you do to be lazier about heap-allocating the frame is going to be paid for in other ways.

Karl · February 12, 2019, 6:43pm

I'm not sure the abstraction over async and throws is all that useful. I can't think of many examples besides Sequence where you might want to blanket over such different behaviours.

I remember a similar discussion a while ago about files and Sequence, and the consensus was that you generally don't want to use generic Sequence/Collection algorithms on things like file-handles or remote directory iterators. Essentially any I/O operation could fail, and it's not worth burdening all generic Sequence code with handling that; especially since most algorithms (e.g. sort) wouldn't perform nearly as efficiently on those kinds of things as they would if you just copied it all to an Array first.

Instead, it's better to define your own protocol - perhaps inspired by Sequence or Collection - which is tailored for your situation.

Nevin · February 12, 2019, 9:36pm

I recently encountered a scenario where I wanted this sort of abstraction over the throwing-ness of stored properties. I ended up having to entirely duplicate the implementation with one hierarchy of classes that throw and another that don’t. It would be nice to use something like rethrows there.

TellowKrinkle · February 12, 2019, 11:24pm

Sort already copies the entire contents of a Sequence to an array first anyways. Admittedly, RandomAccessCollection stuff makes a lot less sense, but most Sequence things would work just as well on a throwing or async collection. A Zip2Sequence of two buffered file readers would be more efficient than if you read the entire files into memory just to iterate over them. A lazy map would work just as well on a throwing or async sequence. So would the Set initializer, which would keep memory usage down in the case where there were a lot of repeats.

As for other protocols, I'm pretty sure any protocol that you would make for a file today would be extendable to a remote file (async) later. TextOutputStream might also want this, since you could output to either a String or a file.

Joe_Groff · February 12, 2019, 11:41pm

Another interesting future direction we could take to address polymorphism over throws specifically would be to adopt typed throws, and make the error type an independent type argument of all function types. This would mean that a type that doesn't throw effectively throws Never, and one that throws without a type by default throws Error:

(X, Y, Z) -> W        === (X, Y, Z) throws Never -> W
(X, Y, Z) throws -> W === (X, Y, Z) throws Error -> W

This would allow protocols to be generic over throwing and nonthrowing implementations by making the error a separate associated type:

protocol MyIterator {
  associatedtype Element
  associatedtype Error: Swift.Error

  mutating func next() throws Error -> Element
}

struct NonthrowingImplementation: MyIterator {
  mutating func next() -> Int // conforms with Error == Never
}
struct ThrowingImplementation: MyIterator {
  mutating func next() throws -> Int // conforms with Error == Swift.Error
}

anandabits · February 13, 2019, 2:02am

I’m very much in favor of a design in this direction for typed throws. Further, I think it’s important that throwing any uninhabited type is treated equivalently to throwing Never in terms of needing to use try when invoked and needing to handle errors, etc. I wrote about why in [Discussion] Analysis of the design of typed throws - #3 by Anton_Zhilin.

This design for typed throws would support abstracting over error types, including whether one throws at all or not which is great! That said, I think it would sit nicely along side a more general effect abstraction feature. They address related (even overlapping) but distinct use cases.

DevAndArtist · February 13, 2019, 5:40pm

That is exactly what my conclusion was after all the debate around typed throws and Result. That would be just perfect if you‘d ask me.

benrimmington · February 13, 2019, 5:42pm

Another workaround is to use Result as the element of a sequence.

extension FileStream: IteratorProtocol, Sequence {

  public typealias Element = Swift.Result<UInt8, Error>

  public func next() -> Element? {
    guard let byte = UInt8(exactly: fgetc(_stream)) else {
      if feof(_stream) != 0 {
        return nil
      } else {
        return Element.failure(POSIXError())
      }
    }
    return Element.success(byte)
  }
}

gwendal.roue · February 13, 2019, 6:01pm

This is true. However, this implements one possible failure mode of failing sequences.

Another interesting failure mode is that the sequence ends of the first iteration failure.

Compare those two snippets below, which show how different is the consumption of those two kinds of sequences:

// Mode A: Sequence of Result (eventual error is per-element):
for result in sequence {
    do {
        let element = try result.get()
        // Handle element.
    } catch {
        // Handle element failure.
        // It is possible that iterator produces
        // a success element on next iteration step.
    }
}

// Mode B: Sequence ends on first error:
do {
    while let element = try iterator.next() {
        // Handle element
    }
} catch {
    // Handle sequence failure.
    // The consequences of calling iterator.next() after this failure should
    // be precisely defined (programmer error and trap, or the guarantee
    // that some error would be thrown).
}

Those are important semantics differences. And we like to make semantics very clear, if I interpret SE-0052 correctly.

benrimmington · February 13, 2019, 6:54pm

@gwendal.roue It might be necessary to retry only after some errors (e.g. EAGAIN, EINTR, ETIMEDOUT).

do {
  for result in fileStream {
    do {
      let byte = try result.get()
      // TODO: Handle success...
    } catch POSIXError.EINTR {
      continue // Retry an interrupted system call.
    } catch {
      throw error // Rethrow to outer `catch` clause.
    }
  }
} catch {
  // TODO: Handle failure...
}

The post-nil guarantee in SE-0052 should be possible if FileStream doesn't conform itself to IteratorProtocol.

gwendal.roue · February 13, 2019, 8:31pm

In your specific case, some errors are fatal, some other errors are not.

When you iterate a random number generator, all entropy errors are transient.

And when you iterate an SQLite statement, any error is fatal.

What I was trying to say is that a generalization of throwing sequences may well be delicate to define in a way that is precise, and yet does not leave entire classes of problems abandoned on the roadside.

When you mix iteration and errors, you get multiple possible outcomes.

I'm no functional developer, but... Aren't we're trying to combine monads, only to discover that there is no general solution?

Karl · February 13, 2019, 11:11pm

To be fair, your example shows a protocol which already allows its next() method to throw and simply allows non-throwing methods to witness the requirement. It's convenient, but you don't need typed-throws for that; in fact, I think it already works.

I don't think the solution proposed by OP is all that useful - I think we actually have a better thing, right now (Swift 4.2), but I don't think everybody is aware of it, so let me explain:

You can create a parent protocol, whose requirements can throw, and also a non-throwing refinement. The compiler will already recognise these as being the same, so you don't even need to write a default implementation with the throwing signature (see example). However, generic code will be able to require a non-throwing witness if your algorithms are not fault-tolerant. I think it's a cleaner solution.

We could do something like this in the standard library, except that ABI stability bans re-parenting protocols, and Sequence/IteratorProtocol kind of have knives dangling above their heads already.

protocol MightThrowIterator {
  mutating func next() throws -> Int
}
protocol NoThrowIterator: MightThrowIterator {
  mutating func next() -> Int
}

struct A: MightThrowIterator {
  enum Err: Error { case anError }
  mutating func next() throws -> Int {
		throw Err.anError
  }
}

struct B: NoThrowIterator {
  mutating func next() -> Int {
    return 42
  }
}

func tryIterate<T: MightThrowIterator>(_ val: inout T) {
  do {
    let element = try val.next()
    print(element)
  } catch {
    print(error)
  }
}

func definitelyIterate<T: NoThrowIterator>(_ val: inout T) {
  let element = val.next()
  print(element)
}

func test() {
  var testObjA = A()
  var testObjB = B()
  tryIterate(&testObjA) // prints: 'anError'
  tryIterate(&testObjB) // prints: 42
  definitelyIterate(&testObjA) // Compile error: 'A' does not conform to expected type 'NoThrowIterator'
  definitelyIterate(&testObjB) // prints: 42
}

Joe_Groff · February 13, 2019, 11:15pm

The thing that's added that you can't do now is the ability to abstract over the throw-iness of the conforming type using the associated type. You could think of it as analogous to the difference between an (Any) -> Any function and a (T) -> T function; both take the same set of inputs, but the latter also allows you to preserve the type information coming out of the function instead of losing it going in. The closest thing you can do now is build parallel overloaded protocol hierarchies, as you noted.

Karl · February 13, 2019, 11:26pm

I'm not sure what you mean - I don't think throw-iness is something you can really abstract over, because throws is a superset of non-throws and the reverse isn't true. At the most abstract level, you will have to assume something might throw. Which is what we have.

I don't actually think typed-throws is that useful in practice. There are a few cases where you can exhaustively know all the errors you will throw, and they will all happen to share a common type which also is narrow enough to be useful, but as soon as things get bigger you'll have to erase to Error. IMO, the only macroscopically-meaningful distinction in such a system is Error vs. Never, which we already have with throws.

What's more, because Swift encourages type-safety and people love exhaustive switching and the feeling of absolute determinism, I think a lot of people would reach for typed-throws when they'd probably be better off without it.

Joe_Groff · February 13, 2019, 11:30pm

For many (though not all) places where rethrows is used with with* { ... }-style scoping operations, it might be worth considering whether coroutines would overall be a better model. A yield-once coroutine, like what we now use for property accesses, can express similar patterns where some setup and teardown needs to happen around a scope, but because the coroutine is a separate context that sits alongside the main context, it doesn't need to pass through the effects of the inner scope like a higher-order function does when calling a closure. If coroutines were exposed as a user-facing feature and adopted in place of higher-order functions, the coroutine forms would be usable in any effect context without needing to complicate the type system to represent the effect propagation.