typed throws

anandabits · August 19, 2017, 1:28pm

Sent from my iPad

Joe Groff wrote:

An alternative approach that embraces the open nature of errors could be to represent domains as independent protocols, and extend the error types that are relevant to that domain to conform to the protocol. That way, you don't obscure the structure of the underlying error value with wrappers. If you expect to exhaustively handle all errors in a domain, well, you'd almost certainly going to need to have a fallback case in your wrapper type for miscellaneous errors, but you could represent that instead without wrapping via a catch-all, and as?-casting to your domain protocol with a ??-default for errors that don't conform to the protocol. For example, instead of attempting something like this:

enum DatabaseError {
  case queryError(QueryError)
  case ioError(IOError)
  case other(Error)

  var errorKind: String {
    switch self {
      case .queryError(let q): return "query error: \(q.query)"
      case .ioError(let i): return "io error: \(i.filename)"
      case .other(let e): return "\(e)"
    }
  }
}

func queryDatabase(_ query: String) throws /*DatabaseError*/ -> Table

do {
  queryDatabase("delete * from users")
} catch let d as DatabaseError {
  os_log(d.errorKind)
} catch {
  fatalError("unexpected non-database error")
}

You could do this:

protocol DatabaseError {
  var errorKind: String { get }
}

extension QueryError: DatabaseError {
  var errorKind: String { return "query error: \(q.query)" }
}
extension IOError: DatabaseError {
  var errorKind: String ( return "io error: \(i.filename)" }
}

extension Error {
  var databaseErrorKind: String {
    return (error as? DatabaseError)?.errorKind ?? "unexpected non-database error"
  }
}

func queryDatabase(_ query: String) throws -> Table

do {
  queryDatabase("delete * from users")
} catch {
  os_log(error.databaseErrorKind)
}

This approach isn't sufficient for several reasons. Notably, it requires the underlying errors to already have a distinct type for every category we wish to place them in. If all network errors have the same type and I want to categorize them based on network availability, authentication, dropped connection, etc I am not able to do that.

Sorry, how does the presence or absence of typed throws play into this?

It provides a convenient way to drive an error conversion mechanism during propagation, whether in a library function used to wrap the throwing expression or ideally with language support. If I call a function that throws FooError and my function throws BarError and we have a way to go from FooError to BarError we can invoke that conversion without needing to catch and rethrow the wrapped error.

It also provides convenient documentation of the categorization along with a straightforward way to match the cases (with code completion as Chris pointed out). IMO, making this information immediately clear and with easy matching at call sites is crucial to improving how people handle errors in practice.

Error handling is an afterthought all too often. The value of making it immediately clear how to match important categories of errors should not be understated. I really believe language support of some kind is warranted and would have an impact on the quality of software. Maybe types aren't the right solution, but we do need one.

Deciding what categories are important is obviously subjective, but I do believe that libraries focused on a specific domain can often make reasonable guesses that are pretty close in the majority of use cases. This is especially true for internal libraries where part of the purpose of the library may be to establish conventions for the app that are intended to be used (almost) everywhere.

···

Sent from my iPad

On Aug 18, 2017, at 9:19 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Fri, Aug 18, 2017 at 8:11 PM, Matthew Johnson <matthew@anandabits.com> wrote:

On Aug 18, 2017, at 6:56 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

The kind of categorization I want to be able to do requires a custom algorithm. The specific algorithm is used to categorize errors depends on the dynamic context (i.e. the function that is propagating it). The way I usually think about this categorization is as a conversion initializer as I showed in the example, but it certainly wouldn't need to be accomplished that way. The most important thing IMO is the ability to categorize during error propagation and make information about that categorization easy for callers to discover.

The output of the algorithm could use various mechanisms for categorization - an enum is one mechanism, distinct types conforming to appropriate categorization protocols is another. Attaching some kind of category value to the original error or propagating the category along with it might also work (although might be rather clunky).

It is trivial to make the original error immediately available via an `underlyingError` property so I really don't understand the resistance to wrapping errors. The categorization can easily be ignored at the catch site if desired. That said, if we figure out some other mechanism for categorizing errors, including placing different error values of the same type into different categories, and matching them based on this categorization I think I would be ok with that. Using wrapper types is not essential to solving the problem.

Setting all of this aside, surely you had you had your own reasons for supporting typed errors in the past. What were those and why do you no longer consider them important?

My memory is certainly spotty, but as far as I can recall, I had no distinct reasons; it just seemed like a reasonable and "natural" next step that other people wanted for which I had no use case of my own in mind. Having seen the argumentation that there aren't very many use cases in general, I'm warming to the view that it's probably not such a great next step.

On Fri, Aug 18, 2017 at 6:46 PM, Matthew Johnson <matthew@anandabits.com> wrote:

On Aug 18, 2017, at 6:29 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Fri, Aug 18, 2017 at 6:19 PM, Matthew Johnson <matthew@anandabits.com> wrote:

On Aug 18, 2017, at 6:15 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Fri, Aug 18, 2017 at 09:20 Matthew Johnson via swift-evolution <swift-evolution@swift.org> wrote:

Sent from my iPad

On Aug 18, 2017, at 1:27 AM, John McCall <rjmccall@apple.com> wrote:

>> On Aug 18, 2017, at 12:58 AM, Chris Lattner via swift-evolution <swift-evolution@swift.org> wrote:
>> Splitting this off into its own thread:
>>
>>> On Aug 17, 2017, at 7:39 PM, Matthew Johnson <matthew@anandabits.com> wrote:
>>> One related topic that isn’t discussed is type errors. Many third party libraries use a Result type with typed errors. Moving to an async / await model without also introducing typed errors into Swift would require giving up something that is highly valued by many Swift developers. Maybe Swift 5 is the right time to tackle typed errors as well. I would be happy to help with design and drafting a proposal but would need collaborators on the implementation side.
>>
>> Typed throws is something we need to settle one way or the other, and I agree it would be nice to do that in the Swift 5 cycle.
>>
>> For the purposes of this sub-discussion, I think there are three kinds of code to think about:
>> 1) large scale API like Cocoa which evolve (adding significant functionality) over the course of many years and can’t break clients.
>> 2) the public API of shared swiftpm packages, whose lifecycle may rise and fall - being obsoleted and replaced by better packages if they encounter a design problem.
>> 3) internal APIs and applications, which are easy to change because the implementations and clients of the APIs are owned by the same people.
>>
>> These each have different sorts of concerns, and we hope that something can start out as #3 but work its way up the stack gracefully.
>>
>> Here is where I think things stand on it:
>> - There is consensus that untyped throws is the right thing for a large scale API like Cocoa. NSError is effectively proven here. Even if typed throws is introduced, Apple is unlikely to adopt it in their APIs for this reason.
>> - There is consensus that untyped throws is the right default for people to reach for for public package (#2).
>> - There is consensus that Java and other systems that encourage lists of throws error types lead to problematic APIs for a variety of reasons.
>> - There is disagreement about whether internal APIs (#3) should use it. It seems perfect to be able to write exhaustive catches in this situation, since everything in knowable. OTOH, this could encourage abuse of error handling in cases where you really should return an enum instead of using throws.
>> - Some people are concerned that introducing typed throws would cause people to reach for it instead of using untyped throws for public package APIs.
>
> Even for non-public code. The only practical merit of typed throws I have ever seen someone demonstrate is that it would let them use contextual lookup in a throw or catch. People always say "I'll be able to exhaustively switch over my errors", and then I ask them to show me where they want to do that, and they show me something that just logs the error, which of course does not require typed throws. Every. Single. Time.

I agree that exhaustive switching over errors is something that people are extremely likely to actually want to do. I also think it's a bit of a red herring. The value of typed errors is *not* in exhaustive switching. It is in categorization and verified documentation.

Here is a concrete example that applies to almost every app. When you make a network request there are many things that could go wrong to which you may want to respond differently:
* There might be no network available. You might recover by updating the UI to indicate that and start monitoring for a reachability change.
* There might have been a server error that should eventually be resolved (500). You might update the UI and provide the user the ability to retry.
* There might have been an unrecoverable server error (404). You will update the UI.
* There might have been a low level parsing error (bad JSON, etc). Recovery is perhaps similar in nature to #2, but the problem is less likely to be resolved quickly so you may not provide a retry option. You might also want to do something to notify your dev team that the server is returning JSON that can't be parsed.
* There might have been a higher-level parsing error (converting JSON to model types). This might be treated the same as bad JSON. On the other hand, depending on the specifics of the app, you might take an alternate path that only parses the most essential model data in hopes that the problem was somewhere else and this parse will succeed.

All of this can obviously be accomplished with untyped errors. That said, using types to categorize errors would significantly improve the clarity of such code. More importantly, I believe that by categorizing errors in ways that are most relevant to a specific domain a library (perhaps internal to an app) can encourage developers to think carefully about how to respond.

I used to be rather in favor of adding typed errors, thinking that it can only benefit and seemed reasonable. However, given the very interesting discussion here, I'm inclined to think that what you articulate above is actually a very good argument _against_ adding typed errors.

If I may simplify, the gist of the argument advanced by Tino, Charlie, and you is that the primary goal is documentation, and that documentation in the form of prose is insufficient because it can be unreliable. Therefore, you want a way for the compiler to enforce said documentation. (The categorization use case, I think, is well addressed by the protocol-based design discussed already in this thread.)

Actually documentation is only one of the goals I have and it is the least important. Please see my subsequent reply to John where I articulate the four primary goals I have for improved error handling, whether it be typed errors or some other mechanism. I am curious to see what you think of the goals, as well as what mechanism might best address those goals.

Your other three goals have to do with what you term categorization, unless I misunderstand. Are those not adequately addressed by Joe Groff's protocol-based design?

Can you elaborate on what you mean by Joe Gross’s protocol-based design? I certainly haven’t seen anything that I believe addresses those goals well.

However, the compiler itself cannot reward, only punish in the form of errors or warnings; if exhaustive switching is a red herring and the payoff for typed errors is correct documentation, the effectiveness of this kind of compiler enforcement must be directly proportional to the degree of extrinsic punishment inflicted by the compiler (since the intrinsic reward of correct documentation is the same whether it's spelled using doc comments or the type system). This seems like a heavy-handed way to enforce documentation of only one specific aspect of a throwing function; moreover, if this use case were to be sufficiently compelling, then it's certainly a better argument for SourceKit (or some other builtin tool) to automatically generate information on all errors thrown than for the compiler to require that users declare it themselves--even if opt-in.

Bad error handling is pervasive. The fact that everyone shows you code that just logs the error is a prime example of this. It should be considered a symptom of a problem, not an acceptable status quo to be maintained. We need all the tools at our disposal to encourage better thinking about and handling of errors. Most importantly, I think we need a middle ground between completely untyped errors and an exhaustive list of every possible error that might happen. I believe a well designed mechanism for categorizing errors in a compiler-verified way can do exactly this.

In many respects, there are similarities to this in the design of `NSError` which provides categorization via the error domain. This categorization is a bit more broad than I think is useful in many cases, but it is the best example I'm aware of.

The primary difference between error domains and the kind of categorization I am proposing is that error domains categorize based on the source of an error whereas I am proposing categorization driven by likely recovery strategies. Recovery is obviously application dependent, but I think the example above demonstrates that there are some useful generalizations that can be made (especially in an app-specific library), even if they don't apply everywhere.

> Sometimes we then go on to have a conversation about wrapping errors in other error types, and that can be interesting, but now we're talking about adding a big, messy feature just to get "safety" guarantees for a fairly minor need.

I think you're right that wrapping errors is tightly related to an effective use of typed errors. You can do a reasonable job without language support (as has been discussed on the list in the past). On the other hand, if we're going to introduce typed errors we should do it in a way that *encourages* effective use of them. My opinion is that encouraging effect use means categorizing (wrapping) errors without requiring any additional syntax beyond the simple `try` used by untyped errors. In practice, this means we should not need to catch and rethrow an error if all we want to do is categorize it. Rust provides good prior art in this area.

>
> Programmers often have an instinct to obsess over error taxonomies that is very rarely directed at solving any real problem; it is just self-imposed busy-work.

I agree that obsessing over intricate taxonomies is counter-productive and should be discouraged. On the other hand, I hope the example I provided above can help to focus the discussion on a practical use of types to categorize errors in a way that helps guide *thinking* and therefore improves error handling in practice.

>
>> - Some people think that while it might be useful in some narrow cases, the utility isn’t high enough to justify making the language more complex (complexity that would intrude on the APIs of result types, futures, etc)
>>
>> I’m sure there are other points in the discussion that I’m forgetting.
>>
>> One thing that I’m personally very concerned about is in the systems programming domain. Systems code is sort of the classic example of code that is low-level enough and finely specified enough that there are lots of knowable things, including the failure modes.
>
> Here we are using "systems" to mean "embedded systems and kernels". And frankly even a kernel is a large enough system that they don't want to exhaustively switch over failures; they just want the static guarantees that go along with a constrained error type.
>
>> Beyond expressivity though, our current model involves boxing thrown values into an Error existential, something that forces an implicit memory allocation when the value is large. Unless this is fixed, I’m very concerned that we’ll end up with a situation where certain kinds of systems code (i.e., that which cares about real time guarantees) will not be able to use error handling at all.
>>
>> JohnMC has some ideas on how to change code generation for ‘throws’ to avoid this problem, but I don’t understand his ideas enough to know if they are practical and likely to happen or not.
>
> Essentially, you give Error a tagged-pointer representation to allow payload-less errors on non-generic error types to be allocated globally, and then you can (1) tell people to not throw errors that require allocation if it's vital to avoid allocation (just like we would tell them today not to construct classes or indirect enum cases) and (2) allow a special global payload-less error to be substituted if error allocation fails.
>
> Of course, we could also say that systems code is required to use a typed-throws feature that we add down the line for their purposes. Or just tell them to not use payloads. Or force them to constrain their error types to fit within some given size. (Note that obsessive error taxonomies tend to end up with a bunch of indirect enum cases anyway, because they get recursive, so the allocation problem is very real whatever we do.)
>
> John.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

xwu · August 19, 2017, 5:43pm

Sent from my iPad

Sent from my iPad

Joe Groff wrote:

An alternative approach that embraces the open nature of errors could be
to represent domains as independent protocols, and extend the error types
that are relevant to that domain to conform to the protocol. That way, you
don't obscure the structure of the underlying error value with wrappers. If
you expect to exhaustively handle all errors in a domain, well, you'd
almost certainly going to need to have a fallback case in your wrapper type
for miscellaneous errors, but you could represent that instead without
wrapping via a catch-all, and as?-casting to your domain protocol with a
??-default for errors that don't conform to the protocol. For example,
instead of attempting something like this:

enum DatabaseError {
  case queryError(QueryError)
  case ioError(IOError)
  case other(Error)

  var errorKind: String {
    switch self {
      case .queryError(let q): return "query error: \(q.query)"
      case .ioError(let i): return "io error: \(i.filename)"
      case .other(let e): return "\(e)"
    }
  }
}

func queryDatabase(_ query: String) throws /*DatabaseError*/ -> Table

do {
  queryDatabase("delete * from users")
} catch let d as DatabaseError {
  os_log(d.errorKind)
} catch {
  fatalError("unexpected non-database error")
}

You could do this:

protocol DatabaseError {
  var errorKind: String { get }
}

extension QueryError: DatabaseError {
  var errorKind: String { return "query error: \(q.query)" }
}
extension IOError: DatabaseError {
  var errorKind: String ( return "io error: \(i.filename)" }
}

extension Error {
  var databaseErrorKind: String {
    return (error as? DatabaseError)?.errorKind ?? "unexpected
non-database error"
  }
}

func queryDatabase(_ query: String) throws -> Table

do {
  queryDatabase("delete * from users")
} catch {
  os_log(error.databaseErrorKind)
}

This approach isn't sufficient for several reasons. Notably, it requires
the underlying errors to already have a distinct type for every category we
wish to place them in. If all network errors have the same type and I want
to categorize them based on network availability, authentication, dropped
connection, etc I am not able to do that.

Sorry, how does the presence or absence of typed throws play into this?

It provides a convenient way to drive an error conversion mechanism during
propagation, whether in a library function used to wrap the throwing
expression or ideally with language support. If I call a function that
throws FooError and my function throws BarError and we have a way to go
from FooError to BarError we can invoke that conversion without needing to
catch and rethrow the wrapped error.

But isn't that an argument *against* typed errors? You need this
language-level support to automatically convert FooErrors to BarErrors
*because* you've restricted yourself to throwing BarErrors and the function
you call is restricted to throwing FooErrors. Currently, without typed
errors, there is no need to convert a FooError to a BarError.

As mentioned above, it's difficult even internally to design a single
ontology of errors that works throughout a library, so compiler support for
typed errors would be tantamount to a compiler-enforced facility that
pervasively requires this laborious classification and re-classification of
errrors whenever a function rethrows, much of which may be ultimately
unnecessary. In other words, if you are a library vendor and wrap every
FooError from an upstream dependency into a BarError, your user is still
likely to have their own classification of errors and decide to handle
different groups of BarError cases differently anyway, so what was the
point of your laborious conversion of FooErrors to BarErrors?

It also provides convenient documentation of the categorization along with

a straightforward way to match the cases (with code completion as Chris
pointed out). IMO, making this information immediately clear and with easy
matching at call sites is crucial to improving how people handle errors in
practice.

Again, I don't see documentation as a sufficient argument for this feature;
there is no reason why the Swift compiler could not extract comprehensive
information about what errors are thrown at compile time without typed
errors--and with more granularity than can be documented via types (since
only specific enum cases may ever be thrown in a particular function).

Error handling is an afterthought all too often. The value of making it

immediately clear how to match important categories of errors should not be
understated.

See, this is probably where I'm failing to understand you. Every library
that has its own Error-conforming types offers an ontology of errors that,
at least to its authors, make some sort of sense. At the call site, you can
`catch` specific categories of errors or `switch` over specific errors.
Yes, this can become a little annoying if your own classification of errors
differs from the library authors' classification. However, I fail to see
how typed errors makes this any better, other than that you'd `catch` only
one type of error but have to `switch` over cases and then `switch` over
the underlying error. Only now, you've introduced this issue where, for the
library authors, FooErrors have to be reclassified into BarErrors, and then
into BazErrors, and then into BooErrors--to what end? It seems only to
accomplish the goal of making error handling not an afterthought by causing
the compiler to make it more of a nuisance.

I really believe language support of some kind is warranted and would have

···

On Sat, Aug 19, 2017 at 08:29 Matthew Johnson <matthew@anandabits.com> wrote:

On Aug 18, 2017, at 9:19 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Fri, Aug 18, 2017 at 8:11 PM, Matthew Johnson <matthew@anandabits.com> > wrote:

On Aug 18, 2017, at 6:56 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

an impact on the quality of software. Maybe types aren't the right
solution, but we do need one.

Deciding what categories are important is obviously subjective, but I do
believe that libraries focused on a specific domain can often make
reasonable guesses that are pretty close in the majority of use cases.
This is especially true for internal libraries where part of the purpose of
the library may be to establish conventions for the app that are intended
to be used (almost) everywhere.

The kind of categorization I want to be able to do requires a custom
algorithm. The specific algorithm is used to categorize errors depends on
the dynamic context (i.e. the function that is propagating it). The way I
usually think about this categorization is as a conversion initializer as I
showed in the example, but it certainly wouldn't need to be accomplished
that way. The most important thing IMO is the ability to categorize during
error propagation and make information about that categorization easy for
callers to discover.

The output of the algorithm could use various mechanisms for
categorization - an enum is one mechanism, distinct types conforming to
appropriate categorization protocols is another. Attaching some kind of
category value to the original error or propagating the category along with
it might also work (although might be rather clunky).

It is trivial to make the original error immediately available via an
`underlyingError` property so I really don't understand the resistance to
wrapping errors. The categorization can easily be ignored at the catch
site if desired. That said, if we figure out some other mechanism for
categorizing errors, including placing different error values of the same
type into different categories, and matching them based on this
categorization I think I would be ok with that. Using wrapper types is not
essential to solving the problem.

Setting all of this aside, surely you had you had your own reasons for
supporting typed errors in the past. What were those and why do you no
longer consider them important?

My memory is certainly spotty, but as far as I can recall, I had no
distinct reasons; it just seemed like a reasonable and "natural" next step
that other people wanted for which I had no use case of my own in mind.
Having seen the argumentation that there aren't very many use cases in
general, I'm warming to the view that it's probably not such a great next
step.

On Fri, Aug 18, 2017 at 6:46 PM, Matthew Johnson <matthew@anandabits.com> >> wrote:

On Aug 18, 2017, at 6:29 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Fri, Aug 18, 2017 at 6:19 PM, Matthew Johnson <matthew@anandabits.com >>> > wrote:

On Aug 18, 2017, at 6:15 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Fri, Aug 18, 2017 at 09:20 Matthew Johnson via swift-evolution < >>>> swift-evolution@swift.org> wrote:

Sent from my iPad

On Aug 18, 2017, at 1:27 AM, John McCall <rjmccall@apple.com> wrote:

>> On Aug 18, 2017, at 12:58 AM, Chris Lattner via swift-evolution < >>>>> swift-evolution@swift.org> wrote:
>> Splitting this off into its own thread:
>>
>>> On Aug 17, 2017, at 7:39 PM, Matthew Johnson < >>>>> matthew@anandabits.com> wrote:
>>> One related topic that isn’t discussed is type errors. Many third
party libraries use a Result type with typed errors. Moving to an async /
await model without also introducing typed errors into Swift would require
giving up something that is highly valued by many Swift developers. Maybe
Swift 5 is the right time to tackle typed errors as well. I would be happy
to help with design and drafting a proposal but would need collaborators on
the implementation side.
>>
>> Typed throws is something we need to settle one way or the other,
and I agree it would be nice to do that in the Swift 5 cycle.
>>
>> For the purposes of this sub-discussion, I think there are three
kinds of code to think about:
>> 1) large scale API like Cocoa which evolve (adding significant
functionality) over the course of many years and can’t break clients.
>> 2) the public API of shared swiftpm packages, whose lifecycle may
rise and fall - being obsoleted and replaced by better packages if they
encounter a design problem.
>> 3) internal APIs and applications, which are easy to change because
the implementations and clients of the APIs are owned by the same people.
>>
>> These each have different sorts of concerns, and we hope that
something can start out as #3 but work its way up the stack gracefully.
>>
>> Here is where I think things stand on it:
>> - There is consensus that untyped throws is the right thing for a
large scale API like Cocoa. NSError is effectively proven here. Even if
typed throws is introduced, Apple is unlikely to adopt it in their APIs for
this reason.
>> - There is consensus that untyped throws is the right default for
people to reach for for public package (#2).
>> - There is consensus that Java and other systems that encourage
lists of throws error types lead to problematic APIs for a variety of
reasons.
>> - There is disagreement about whether internal APIs (#3) should use
it. It seems perfect to be able to write exhaustive catches in this
situation, since everything in knowable. OTOH, this could encourage abuse
of error handling in cases where you really should return an enum instead
of using throws.
>> - Some people are concerned that introducing typed throws would
cause people to reach for it instead of using untyped throws for public
package APIs.
>
> Even for non-public code. The only practical merit of typed throws
I have ever seen someone demonstrate is that it would let them use
contextual lookup in a throw or catch. People always say "I'll be able to
exhaustively switch over my errors", and then I ask them to show me where
they want to do that, and they show me something that just logs the error,
which of course does not require typed throws. Every. Single. Time.

I agree that exhaustive switching over errors is something that people
are extremely likely to actually want to do. I also think it's a bit of a
red herring. The value of typed errors is *not* in exhaustive switching.
It is in categorization and verified documentation.

Here is a concrete example that applies to almost every app. When you
make a network request there are many things that could go wrong to which
you may want to respond differently:
* There might be no network available. You might recover by updating
the UI to indicate that and start monitoring for a reachability change.
* There might have been a server error that should eventually be
resolved (500). You might update the UI and provide the user the ability
to retry.
* There might have been an unrecoverable server error (404). You will
update the UI.
* There might have been a low level parsing error (bad JSON, etc).
Recovery is perhaps similar in nature to #2, but the problem is less likely
to be resolved quickly so you may not provide a retry option. You might
also want to do something to notify your dev team that the server is
returning JSON that can't be parsed.
* There might have been a higher-level parsing error (converting JSON
to model types). This might be treated the same as bad JSON. On the other
hand, depending on the specifics of the app, you might take an alternate
path that only parses the most essential model data in hopes that the
problem was somewhere else and this parse will succeed.

All of this can obviously be accomplished with untyped errors. That
said, using types to categorize errors would significantly improve the
clarity of such code. More importantly, I believe that by categorizing
errors in ways that are most relevant to a specific domain a library
(perhaps internal to an app) can encourage developers to think carefully
about how to respond.

I used to be rather in favor of adding typed errors, thinking that it
can only benefit and seemed reasonable. However, given the very interesting
discussion here, I'm inclined to think that what you articulate above is
actually a very good argument _against_ adding typed errors.

If I may simplify, the gist of the argument advanced by Tino, Charlie,
and you is that the primary goal is documentation, and that documentation
in the form of prose is insufficient because it can be unreliable.
Therefore, you want a way for the compiler to enforce said documentation.
(The categorization use case, I think, is well addressed by the
protocol-based design discussed already in this thread.)

Actually documentation is only one of the goals I have and it is the
least important. Please see my subsequent reply to John where I articulate
the four primary goals I have for improved error handling, whether it be
typed errors or some other mechanism. I am curious to see what you think
of the goals, as well as what mechanism might best address those goals.

Your other three goals have to do with what you term categorization,
unless I misunderstand. Are those not adequately addressed by Joe Groff's
protocol-based design?

Can you elaborate on what you mean by Joe Gross’s protocol-based
design? I certainly haven’t seen anything that I believe addresses those
goals well.

However, the compiler itself cannot reward, only punish in the form of
errors or warnings; if exhaustive switching is a red herring and the payoff
for typed errors is correct documentation, the effectiveness of this kind
of compiler enforcement must be directly proportional to the degree of
extrinsic punishment inflicted by the compiler (since the intrinsic reward
of correct documentation is the same whether it's spelled using doc
comments or the type system). This seems like a heavy-handed way to enforce
documentation of only one specific aspect of a throwing function; moreover,
if this use case were to be sufficiently compelling, then it's certainly a
better argument for SourceKit (or some other builtin tool) to automatically
generate information on all errors thrown than for the compiler to require
that users declare it themselves--even if opt-in.

Bad error handling is pervasive. The fact that everyone shows you code

that just logs the error is a prime example of this. It should be
considered a symptom of a problem, not an acceptable status quo to be
maintained. We need all the tools at our disposal to encourage better
thinking about and handling of errors. Most importantly, I think we need a
middle ground between completely untyped errors and an exhaustive list of
every possible error that might happen. I believe a well designed
mechanism for categorizing errors in a compiler-verified way can do exactly
this.

In many respects, there are similarities to this in the design of
`NSError` which provides categorization via the error domain. This
categorization is a bit more broad than I think is useful in many cases,
but it is the best example I'm aware of.

The primary difference between error domains and the kind of
categorization I am proposing is that error domains categorize based on the
source of an error whereas I am proposing categorization driven by likely
recovery strategies. Recovery is obviously application dependent, but I
think the example above demonstrates that there are some useful
generalizations that can be made (especially in an app-specific library),
even if they don't apply everywhere.

> Sometimes we then go on to have a conversation about wrapping errors
in other error types, and that can be interesting, but now we're talking
about adding a big, messy feature just to get "safety" guarantees for a
fairly minor need.

I think you're right that wrapping errors is tightly related to an
effective use of typed errors. You can do a reasonable job without
language support (as has been discussed on the list in the past). On the
other hand, if we're going to introduce typed errors we should do it in a
way that *encourages* effective use of them. My opinion is that
encouraging effect use means categorizing (wrapping) errors without
requiring any additional syntax beyond the simple `try` used by untyped
errors. In practice, this means we should not need to catch and rethrow an
error if all we want to do is categorize it. Rust provides good prior art
in this area.

>
> Programmers often have an instinct to obsess over error taxonomies
that is very rarely directed at solving any real problem; it is just
self-imposed busy-work.

I agree that obsessing over intricate taxonomies is counter-productive
and should be discouraged. On the other hand, I hope the example I
provided above can help to focus the discussion on a practical use of types
to categorize errors in a way that helps guide *thinking* and therefore
improves error handling in practice.

>
>> - Some people think that while it might be useful in some narrow
cases, the utility isn’t high enough to justify making the language more
complex (complexity that would intrude on the APIs of result types,
futures, etc)
>>
>> I’m sure there are other points in the discussion that I’m
forgetting.
>>
>> One thing that I’m personally very concerned about is in the
systems programming domain. Systems code is sort of the classic example of
code that is low-level enough and finely specified enough that there are
lots of knowable things, including the failure modes.
>
> Here we are using "systems" to mean "embedded systems and kernels".
And frankly even a kernel is a large enough system that they don't want to
exhaustively switch over failures; they just want the static guarantees
that go along with a constrained error type.
>
>> Beyond expressivity though, our current model involves boxing
thrown values into an Error existential, something that forces an implicit
memory allocation when the value is large. Unless this is fixed, I’m very
concerned that we’ll end up with a situation where certain kinds of systems
code (i.e., that which cares about real time guarantees) will not be able
to use error handling at all.
>>
>> JohnMC has some ideas on how to change code generation for ‘throws’
to avoid this problem, but I don’t understand his ideas enough to know if
they are practical and likely to happen or not.
>
> Essentially, you give Error a tagged-pointer representation to allow
payload-less errors on non-generic error types to be allocated globally,
and then you can (1) tell people to not throw errors that require
allocation if it's vital to avoid allocation (just like we would tell them
today not to construct classes or indirect enum cases) and (2) allow a
special global payload-less error to be substituted if error allocation
fails.
>
> Of course, we could also say that systems code is required to use a
typed-throws feature that we add down the line for their purposes. Or just
tell them to not use payloads. Or force them to constrain their error
types to fit within some given size. (Note that obsessive error taxonomies
tend to end up with a bunch of indirect enum cases anyway, because they get
recursive, so the allocation problem is very real whatever we do.)
>
> John.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

anandabits · August 19, 2017, 7:04pm

Sent from my iPad

Sent from my iPad

Joe Groff wrote:

An alternative approach that embraces the open nature of errors could be to represent domains as independent protocols, and extend the error types that are relevant to that domain to conform to the protocol. That way, you don't obscure the structure of the underlying error value with wrappers. If you expect to exhaustively handle all errors in a domain, well, you'd almost certainly going to need to have a fallback case in your wrapper type for miscellaneous errors, but you could represent that instead without wrapping via a catch-all, and as?-casting to your domain protocol with a ??-default for errors that don't conform to the protocol. For example, instead of attempting something like this:

enum DatabaseError {
  case queryError(QueryError)
  case ioError(IOError)
  case other(Error)

  var errorKind: String {
    switch self {
      case .queryError(let q): return "query error: \(q.query)"
      case .ioError(let i): return "io error: \(i.filename)"
      case .other(let e): return "\(e)"
    }
  }
}

func queryDatabase(_ query: String) throws /*DatabaseError*/ -> Table

do {
  queryDatabase("delete * from users")
} catch let d as DatabaseError {
  os_log(d.errorKind)
} catch {
  fatalError("unexpected non-database error")
}

You could do this:

protocol DatabaseError {
  var errorKind: String { get }
}

extension QueryError: DatabaseError {
  var errorKind: String { return "query error: \(q.query)" }
}
extension IOError: DatabaseError {
  var errorKind: String ( return "io error: \(i.filename)" }
}

extension Error {
  var databaseErrorKind: String {
    return (error as? DatabaseError)?.errorKind ?? "unexpected non-database error"
  }
}

func queryDatabase(_ query: String) throws -> Table

do {
  queryDatabase("delete * from users")
} catch {
  os_log(error.databaseErrorKind)
}

This approach isn't sufficient for several reasons. Notably, it requires the underlying errors to already have a distinct type for every category we wish to place them in. If all network errors have the same type and I want to categorize them based on network availability, authentication, dropped connection, etc I am not able to do that.

Sorry, how does the presence or absence of typed throws play into this?

It provides a convenient way to drive an error conversion mechanism during propagation, whether in a library function used to wrap the throwing expression or ideally with language support. If I call a function that throws FooError and my function throws BarError and we have a way to go from FooError to BarError we can invoke that conversion without needing to catch and rethrow the wrapped error.

But isn't that an argument *against* typed errors? You need this language-level support to automatically convert FooErrors to BarErrors *because* you've restricted yourself to throwing BarErrors and the function you call is restricted to throwing FooErrors. Currently, without typed errors, there is no need to convert a FooError to a BarError.

No, this is a misunderstanding of the point of the conversion. In that example, the point of performing a conversion is not because the types are arbitrarily chosen. It is because the initializer of BarError includes an algorithm that categorizes errors. It may place different values of FooError into different cases. What I am after is language-level support for categorizing errors during propagation and making the categories easily visible to anyone who looks at the signature of a function that chooses to categorize its errors. Using types and the initializer is one way (the most obvious way) to do this.

As mentioned above, it's difficult even internally to design a single ontology of errors that works throughout a library, so compiler support for typed errors would be tantamount to a compiler-enforced facility that pervasively requires this laborious classification and re-classification of errrors whenever a function rethrows, much of which may be ultimately unnecessary. In other words, if you are a library vendor and wrap every FooError from an upstream dependency into a BarError, your user is still likely to have their own classification of errors and decide to handle different groups of BarError cases differently anyway, so what was the point of your laborious conversion of FooErrors to BarErrors?

I am not suggesting that categorization be performed everywhere. What I am suggesting is that there are many cases where it can be done in a meaningful way. If you're writing a library and anticipate that there is no way to categorize errors that is meaningful to all users of your library you should not perform categorization. On the other hand, continuing with the line of examples we've already seen, if you're writing an internal networking library and intend to have similar recovery paths throughout your app categorization can be extremely useful. This is especially true as a team grows and you are trying to encourage common practices throughout the app.

It also provides convenient documentation of the categorization along with a straightforward way to match the cases (with code completion as Chris pointed out). IMO, making this information immediately clear and with easy matching at call sites is crucial to improving how people handle errors in practice.

Again, I don't see documentation as a sufficient argument for this feature; there is no reason why the Swift compiler could not extract comprehensive information about what errors are thrown at compile time without typed errors--and with more granularity than can be documented via types (since only specific enum cases may ever be thrown in a particular function).

I am not arguing that documentation alone is a sufficient argument. That said, while the compiler *could* do such things I don't think it would be a priority any time soon (core team, please correct me if I'm wrong here).

Error handling is an afterthought all too often. The value of making it immediately clear how to match important categories of errors should not be understated.

See, this is probably where I'm failing to understand you. Every library that has its own Error-conforming types offers an ontology of errors that, at least to its authors, make some sort of sense. At the call site, you can `catch` specific categories of errors or `switch` over specific errors. Yes, this can become a little annoying if your own classification of errors differs from the library authors' classification. However, I fail to see how typed errors makes this any better, other than that you'd `catch` only one type of error but have to `switch` over cases and then `switch` over the underlying error. Only now, you've introduced this issue where, for the library authors, FooErrors have to be reclassified into BarErrors, and then into BazErrors, and then into BooErrors--to what end? It seems only to accomplish the goal of making error handling not an afterthought by causing the compiler to make it more of a nuisance.

If the only use of this was to move from throwing independent types to throwing a single type with a bunch of cases you are right, there wouldn't be much value. However, the contract for individual functions in a library may vary and there are also errors from dependencies of the library that may also be thrown.

Let's assume for a moment that the error types of a library are carefully designed to offer meaningful categorization. Even then it is still very useful to know which categories may occur at a given call site. Yes, this is only documentation so we'll set it aside and not give it too much weight.

More importantly, if I depend on library X it may depend on library Y as an implementation detail which is not intended to be part of its API contract. As a user of X I should not need to be aware that Y even exists. I certainly am not going to import it and have access to its error types - I don't want a direct dependency on Y. Untyped errors are likely to lead to errors from Y leaking through the interface of X. They may be errors which I would have a meaningful recovery strategy if my code was able to properly understand them, but it won't because I my code doesn't know anything about Y. Yes, these are bugs in X but they are bugs that the language could help to prevent.

As I have said before, none of this *requires* typed errors, they are just a natural way to approach the problem. Any solution that allows for categorization during propagation that can be matched at call sites would be acceptable.

···

Sent from my iPad

On Aug 19, 2017, at 12:43 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Sat, Aug 19, 2017 at 08:29 Matthew Johnson <matthew@anandabits.com> wrote:

On Aug 18, 2017, at 9:19 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Fri, Aug 18, 2017 at 8:11 PM, Matthew Johnson <matthew@anandabits.com> wrote:

On Aug 18, 2017, at 6:56 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

I really believe language support of some kind is warranted and would have an impact on the quality of software. Maybe types aren't the right solution, but we do need one.

Deciding what categories are important is obviously subjective, but I do believe that libraries focused on a specific domain can often make reasonable guesses that are pretty close in the majority of use cases. This is especially true for internal libraries where part of the purpose of the library may be to establish conventions for the app that are intended to be used (almost) everywhere.

The kind of categorization I want to be able to do requires a custom algorithm. The specific algorithm is used to categorize errors depends on the dynamic context (i.e. the function that is propagating it). The way I usually think about this categorization is as a conversion initializer as I showed in the example, but it certainly wouldn't need to be accomplished that way. The most important thing IMO is the ability to categorize during error propagation and make information about that categorization easy for callers to discover.

The output of the algorithm could use various mechanisms for categorization - an enum is one mechanism, distinct types conforming to appropriate categorization protocols is another. Attaching some kind of category value to the original error or propagating the category along with it might also work (although might be rather clunky).

It is trivial to make the original error immediately available via an `underlyingError` property so I really don't understand the resistance to wrapping errors. The categorization can easily be ignored at the catch site if desired. That said, if we figure out some other mechanism for categorizing errors, including placing different error values of the same type into different categories, and matching them based on this categorization I think I would be ok with that. Using wrapper types is not essential to solving the problem.

Setting all of this aside, surely you had you had your own reasons for supporting typed errors in the past. What were those and why do you no longer consider them important?

My memory is certainly spotty, but as far as I can recall, I had no distinct reasons; it just seemed like a reasonable and "natural" next step that other people wanted for which I had no use case of my own in mind. Having seen the argumentation that there aren't very many use cases in general, I'm warming to the view that it's probably not such a great next step.

On Fri, Aug 18, 2017 at 6:46 PM, Matthew Johnson <matthew@anandabits.com> wrote:

On Aug 18, 2017, at 6:29 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Fri, Aug 18, 2017 at 6:19 PM, Matthew Johnson <matthew@anandabits.com> wrote:

On Aug 18, 2017, at 6:15 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Fri, Aug 18, 2017 at 09:20 Matthew Johnson via swift-evolution <swift-evolution@swift.org> wrote:

Sent from my iPad

On Aug 18, 2017, at 1:27 AM, John McCall <rjmccall@apple.com> wrote:

>> On Aug 18, 2017, at 12:58 AM, Chris Lattner via swift-evolution <swift-evolution@swift.org> wrote:
>> Splitting this off into its own thread:
>>
>>> On Aug 17, 2017, at 7:39 PM, Matthew Johnson <matthew@anandabits.com> wrote:
>>> One related topic that isn’t discussed is type errors. Many third party libraries use a Result type with typed errors. Moving to an async / await model without also introducing typed errors into Swift would require giving up something that is highly valued by many Swift developers. Maybe Swift 5 is the right time to tackle typed errors as well. I would be happy to help with design and drafting a proposal but would need collaborators on the implementation side.
>>
>> Typed throws is something we need to settle one way or the other, and I agree it would be nice to do that in the Swift 5 cycle.
>>
>> For the purposes of this sub-discussion, I think there are three kinds of code to think about:
>> 1) large scale API like Cocoa which evolve (adding significant functionality) over the course of many years and can’t break clients.
>> 2) the public API of shared swiftpm packages, whose lifecycle may rise and fall - being obsoleted and replaced by better packages if they encounter a design problem.
>> 3) internal APIs and applications, which are easy to change because the implementations and clients of the APIs are owned by the same people.
>>
>> These each have different sorts of concerns, and we hope that something can start out as #3 but work its way up the stack gracefully.
>>
>> Here is where I think things stand on it:
>> - There is consensus that untyped throws is the right thing for a large scale API like Cocoa. NSError is effectively proven here. Even if typed throws is introduced, Apple is unlikely to adopt it in their APIs for this reason.
>> - There is consensus that untyped throws is the right default for people to reach for for public package (#2).
>> - There is consensus that Java and other systems that encourage lists of throws error types lead to problematic APIs for a variety of reasons.
>> - There is disagreement about whether internal APIs (#3) should use it. It seems perfect to be able to write exhaustive catches in this situation, since everything in knowable. OTOH, this could encourage abuse of error handling in cases where you really should return an enum instead of using throws.
>> - Some people are concerned that introducing typed throws would cause people to reach for it instead of using untyped throws for public package APIs.
>
> Even for non-public code. The only practical merit of typed throws I have ever seen someone demonstrate is that it would let them use contextual lookup in a throw or catch. People always say "I'll be able to exhaustively switch over my errors", and then I ask them to show me where they want to do that, and they show me something that just logs the error, which of course does not require typed throws. Every. Single. Time.

I agree that exhaustive switching over errors is something that people are extremely likely to actually want to do. I also think it's a bit of a red herring. The value of typed errors is *not* in exhaustive switching. It is in categorization and verified documentation.

Here is a concrete example that applies to almost every app. When you make a network request there are many things that could go wrong to which you may want to respond differently:
* There might be no network available. You might recover by updating the UI to indicate that and start monitoring for a reachability change.
* There might have been a server error that should eventually be resolved (500). You might update the UI and provide the user the ability to retry.
* There might have been an unrecoverable server error (404). You will update the UI.
* There might have been a low level parsing error (bad JSON, etc). Recovery is perhaps similar in nature to #2, but the problem is less likely to be resolved quickly so you may not provide a retry option. You might also want to do something to notify your dev team that the server is returning JSON that can't be parsed.
* There might have been a higher-level parsing error (converting JSON to model types). This might be treated the same as bad JSON. On the other hand, depending on the specifics of the app, you might take an alternate path that only parses the most essential model data in hopes that the problem was somewhere else and this parse will succeed.

All of this can obviously be accomplished with untyped errors. That said, using types to categorize errors would significantly improve the clarity of such code. More importantly, I believe that by categorizing errors in ways that are most relevant to a specific domain a library (perhaps internal to an app) can encourage developers to think carefully about how to respond.

I used to be rather in favor of adding typed errors, thinking that it can only benefit and seemed reasonable. However, given the very interesting discussion here, I'm inclined to think that what you articulate above is actually a very good argument _against_ adding typed errors.

If I may simplify, the gist of the argument advanced by Tino, Charlie, and you is that the primary goal is documentation, and that documentation in the form of prose is insufficient because it can be unreliable. Therefore, you want a way for the compiler to enforce said documentation. (The categorization use case, I think, is well addressed by the protocol-based design discussed already in this thread.)

Actually documentation is only one of the goals I have and it is the least important. Please see my subsequent reply to John where I articulate the four primary goals I have for improved error handling, whether it be typed errors or some other mechanism. I am curious to see what you think of the goals, as well as what mechanism might best address those goals.

Your other three goals have to do with what you term categorization, unless I misunderstand. Are those not adequately addressed by Joe Groff's protocol-based design?

Can you elaborate on what you mean by Joe Gross’s protocol-based design? I certainly haven’t seen anything that I believe addresses those goals well.

However, the compiler itself cannot reward, only punish in the form of errors or warnings; if exhaustive switching is a red herring and the payoff for typed errors is correct documentation, the effectiveness of this kind of compiler enforcement must be directly proportional to the degree of extrinsic punishment inflicted by the compiler (since the intrinsic reward of correct documentation is the same whether it's spelled using doc comments or the type system). This seems like a heavy-handed way to enforce documentation of only one specific aspect of a throwing function; moreover, if this use case were to be sufficiently compelling, then it's certainly a better argument for SourceKit (or some other builtin tool) to automatically generate information on all errors thrown than for the compiler to require that users declare it themselves--even if opt-in.

Bad error handling is pervasive. The fact that everyone shows you code that just logs the error is a prime example of this. It should be considered a symptom of a problem, not an acceptable status quo to be maintained. We need all the tools at our disposal to encourage better thinking about and handling of errors. Most importantly, I think we need a middle ground between completely untyped errors and an exhaustive list of every possible error that might happen. I believe a well designed mechanism for categorizing errors in a compiler-verified way can do exactly this.

In many respects, there are similarities to this in the design of `NSError` which provides categorization via the error domain. This categorization is a bit more broad than I think is useful in many cases, but it is the best example I'm aware of.

The primary difference between error domains and the kind of categorization I am proposing is that error domains categorize based on the source of an error whereas I am proposing categorization driven by likely recovery strategies. Recovery is obviously application dependent, but I think the example above demonstrates that there are some useful generalizations that can be made (especially in an app-specific library), even if they don't apply everywhere.

> Sometimes we then go on to have a conversation about wrapping errors in other error types, and that can be interesting, but now we're talking about adding a big, messy feature just to get "safety" guarantees for a fairly minor need.

I think you're right that wrapping errors is tightly related to an effective use of typed errors. You can do a reasonable job without language support (as has been discussed on the list in the past). On the other hand, if we're going to introduce typed errors we should do it in a way that *encourages* effective use of them. My opinion is that encouraging effect use means categorizing (wrapping) errors without requiring any additional syntax beyond the simple `try` used by untyped errors. In practice, this means we should not need to catch and rethrow an error if all we want to do is categorize it. Rust provides good prior art in this area.

>
> Programmers often have an instinct to obsess over error taxonomies that is very rarely directed at solving any real problem; it is just self-imposed busy-work.

I agree that obsessing over intricate taxonomies is counter-productive and should be discouraged. On the other hand, I hope the example I provided above can help to focus the discussion on a practical use of types to categorize errors in a way that helps guide *thinking* and therefore improves error handling in practice.

>
>> - Some people think that while it might be useful in some narrow cases, the utility isn’t high enough to justify making the language more complex (complexity that would intrude on the APIs of result types, futures, etc)
>>
>> I’m sure there are other points in the discussion that I’m forgetting.
>>
>> One thing that I’m personally very concerned about is in the systems programming domain. Systems code is sort of the classic example of code that is low-level enough and finely specified enough that there are lots of knowable things, including the failure modes.
>
> Here we are using "systems" to mean "embedded systems and kernels". And frankly even a kernel is a large enough system that they don't want to exhaustively switch over failures; they just want the static guarantees that go along with a constrained error type.
>
>> Beyond expressivity though, our current model involves boxing thrown values into an Error existential, something that forces an implicit memory allocation when the value is large. Unless this is fixed, I’m very concerned that we’ll end up with a situation where certain kinds of systems code (i.e., that which cares about real time guarantees) will not be able to use error handling at all.
>>
>> JohnMC has some ideas on how to change code generation for ‘throws’ to avoid this problem, but I don’t understand his ideas enough to know if they are practical and likely to happen or not.
>
> Essentially, you give Error a tagged-pointer representation to allow payload-less errors on non-generic error types to be allocated globally, and then you can (1) tell people to not throw errors that require allocation if it's vital to avoid allocation (just like we would tell them today not to construct classes or indirect enum cases) and (2) allow a special global payload-less error to be substituted if error allocation fails.
>
> Of course, we could also say that systems code is required to use a typed-throws feature that we add down the line for their purposes. Or just tell them to not use payloads. Or force them to constrain their error types to fit within some given size. (Note that obsessive error taxonomies tend to end up with a bunch of indirect enum cases anyway, because they get recursive, so the allocation problem is very real whatever we do.)
>
> John.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

xwu · August 19, 2017, 7:16pm

Sent from my iPad

Sent from my iPad

Sent from my iPad

Joe Groff wrote:

An alternative approach that embraces the open nature of errors could be
to represent domains as independent protocols, and extend the error types
that are relevant to that domain to conform to the protocol. That way, you
don't obscure the structure of the underlying error value with wrappers. If
you expect to exhaustively handle all errors in a domain, well, you'd
almost certainly going to need to have a fallback case in your wrapper type
for miscellaneous errors, but you could represent that instead without
wrapping via a catch-all, and as?-casting to your domain protocol with a
??-default for errors that don't conform to the protocol. For example,
instead of attempting something like this:

enum DatabaseError {
  case queryError(QueryError)
  case ioError(IOError)
  case other(Error)

  var errorKind: String {
    switch self {
      case .queryError(let q): return "query error: \(q.query)"
      case .ioError(let i): return "io error: \(i.filename)"
      case .other(let e): return "\(e)"
    }
  }
}

func queryDatabase(_ query: String) throws /*DatabaseError*/ -> Table

do {
  queryDatabase("delete * from users")
} catch let d as DatabaseError {
  os_log(d.errorKind)
} catch {
  fatalError("unexpected non-database error")
}

You could do this:

protocol DatabaseError {
  var errorKind: String { get }
}

extension QueryError: DatabaseError {
  var errorKind: String { return "query error: \(q.query)" }
}
extension IOError: DatabaseError {
  var errorKind: String ( return "io error: \(i.filename)" }
}

extension Error {
  var databaseErrorKind: String {
    return (error as? DatabaseError)?.errorKind ?? "unexpected
non-database error"
  }
}

func queryDatabase(_ query: String) throws -> Table

do {
  queryDatabase("delete * from users")
} catch {
  os_log(error.databaseErrorKind)
}

This approach isn't sufficient for several reasons. Notably, it
requires the underlying errors to already have a distinct type for every
category we wish to place them in. If all network errors have the same
type and I want to categorize them based on network availability,
authentication, dropped connection, etc I am not able to do that.

Sorry, how does the presence or absence of typed throws play into this?

It provides a convenient way to drive an error conversion mechanism
during propagation, whether in a library function used to wrap the throwing
expression or ideally with language support. If I call a function that
throws FooError and my function throws BarError and we have a way to go
from FooError to BarError we can invoke that conversion without needing to
catch and rethrow the wrapped error.

But isn't that an argument *against* typed errors? You need this
language-level support to automatically convert FooErrors to BarErrors
*because* you've restricted yourself to throwing BarErrors and the function
you call is restricted to throwing FooErrors. Currently, without typed
errors, there is no need to convert a FooError to a BarError.

No, this is a misunderstanding of the point of the conversion. In that
example, the point of performing a conversion is not because the types are
arbitrarily chosen. It is because the initializer of BarError includes an
algorithm that categorizes errors. It may place different values of
FooError into different cases. What I am after is language-level support
for categorizing errors during propagation and making the categories easily
visible to anyone who looks at the signature of a function that chooses to
categorize its errors. Using types and the initializer is one way (the
most obvious way) to do this.

As mentioned above, it's difficult even internally to design a single
ontology of errors that works throughout a library, so compiler support for
typed errors would be tantamount to a compiler-enforced facility that
pervasively requires this laborious classification and re-classification of
errrors whenever a function rethrows, much of which may be ultimately
unnecessary. In other words, if you are a library vendor and wrap every
FooError from an upstream dependency into a BarError, your user is still
likely to have their own classification of errors and decide to handle
different groups of BarError cases differently anyway, so what was the
point of your laborious conversion of FooErrors to BarErrors?

I am not suggesting that categorization be performed everywhere. What I
am suggesting is that there are many cases where it can be done in a
meaningful way. If you're writing a library and anticipate that there is
no way to categorize errors that is meaningful to all users of your library
you should not perform categorization. On the other hand, continuing with
the line of examples we've already seen, if you're writing an internal
networking library and intend to have similar recovery paths throughout
your app categorization can be extremely useful. This is especially true
as a team grows and you are trying to encourage common practices throughout
the app.

It also provides convenient documentation of the categorization along with

a straightforward way to match the cases (with code completion as Chris
pointed out). IMO, making this information immediately clear and with easy
matching at call sites is crucial to improving how people handle errors in
practice.

Again, I don't see documentation as a sufficient argument for this
feature; there is no reason why the Swift compiler could not extract
comprehensive information about what errors are thrown at compile time
without typed errors--and with more granularity than can be documented via
types (since only specific enum cases may ever be thrown in a particular
function).

I am not arguing that documentation alone is a sufficient argument. That
said, while the compiler *could* do such things I don't think it would be a
priority any time soon (core team, please correct me if I'm wrong here).

If the compiler is to support typed errors, then certainly it must check
what errors are thrown. I'm just saying it doesn't require the additional
human labor of typed error annotations to reap the desired documentation
benefit; the compiler need only expose that information.

Error handling is an afterthought all too often. The value of making it

immediately clear how to match important categories of errors should not be
understated.

See, this is probably where I'm failing to understand you. Every library
that has its own Error-conforming types offers an ontology of errors that,
at least to its authors, make some sort of sense. At the call site, you
can `catch` specific categories of errors or `switch` over specific errors.
Yes, this can become a little annoying if your own classification of errors
differs from the library authors' classification. However, I fail to see
how typed errors makes this any better, other than that you'd `catch` only
one type of error but have to `switch` over cases and then `switch` over
the underlying error. Only now, you've introduced this issue where, for the
library authors, FooErrors have to be reclassified into BarErrors, and then
into BazErrors, and then into BooErrors--to what end? It seems only to
accomplish the goal of making error handling not an afterthought by causing
the compiler to make it more of a nuisance.

If the only use of this was to move from throwing independent types to
throwing a single type with a bunch of cases you are right, there wouldn't
be much value. However, the contract for individual functions in a library
may vary and there are also errors from dependencies of the library that
may also be thrown.

Let's assume for a moment that the error types of a library are carefully
designed to offer meaningful categorization. Even then it is still very
useful to know which categories may occur at a given call site. Yes, this
is only documentation so we'll set it aside and not give it too much weight.

More importantly, if I depend on library X it may depend on library Y as
an implementation detail which is not intended to be part of its API
contract. As a user of X I should not need to be aware that Y even
exists. I certainly am not going to import it and have access to its error
types - I don't want a direct dependency on Y. Untyped errors are likely
to lead to errors from Y leaking through the interface of X. They may be
errors which I would have a meaningful recovery strategy if my code was
able to properly understand them, but it won't because I my code doesn't
know anything about Y. Yes, these are bugs in X but they are bugs that the
language could help to prevent.

This is a good point which I hadn't considered. The compiler should warn if
the public API of one library leaks non-stdlib third-party errors without
re-exporting the third-party library. But again, this doesn't require typed
errors.

As I have said before, none of this *requires* typed errors, they are just

···

On Sat, Aug 19, 2017 at 2:04 PM, Matthew Johnson <matthew@anandabits.com> wrote:

On Aug 19, 2017, at 12:43 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Sat, Aug 19, 2017 at 08:29 Matthew Johnson <matthew@anandabits.com> > wrote:

On Aug 18, 2017, at 9:19 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Fri, Aug 18, 2017 at 8:11 PM, Matthew Johnson <matthew@anandabits.com> >> wrote:

On Aug 18, 2017, at 6:56 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

a natural way to approach the problem. Any solution that allows for
categorization during propagation that can be matched at call sites would
be acceptable.

I really believe language support of some kind is warranted and would have

an impact on the quality of software. Maybe types aren't the right
solution, but we do need one.

Deciding what categories are important is obviously subjective, but I do
believe that libraries focused on a specific domain can often make
reasonable guesses that are pretty close in the majority of use cases.
This is especially true for internal libraries where part of the purpose of
the library may be to establish conventions for the app that are intended
to be used (almost) everywhere.

The kind of categorization I want to be able to do requires a custom
algorithm. The specific algorithm is used to categorize errors depends on
the dynamic context (i.e. the function that is propagating it). The way I
usually think about this categorization is as a conversion initializer as I
showed in the example, but it certainly wouldn't need to be accomplished
that way. The most important thing IMO is the ability to categorize during
error propagation and make information about that categorization easy for
callers to discover.

The output of the algorithm could use various mechanisms for
categorization - an enum is one mechanism, distinct types conforming to
appropriate categorization protocols is another. Attaching some kind of
category value to the original error or propagating the category along with
it might also work (although might be rather clunky).

It is trivial to make the original error immediately available via an
`underlyingError` property so I really don't understand the resistance to
wrapping errors. The categorization can easily be ignored at the catch
site if desired. That said, if we figure out some other mechanism for
categorizing errors, including placing different error values of the same
type into different categories, and matching them based on this
categorization I think I would be ok with that. Using wrapper types is not
essential to solving the problem.

Setting all of this aside, surely you had you had your own reasons for
supporting typed errors in the past. What were those and why do you no
longer consider them important?

My memory is certainly spotty, but as far as I can recall, I had no
distinct reasons; it just seemed like a reasonable and "natural" next step
that other people wanted for which I had no use case of my own in mind.
Having seen the argumentation that there aren't very many use cases in
general, I'm warming to the view that it's probably not such a great next
step.

On Fri, Aug 18, 2017 at 6:46 PM, Matthew Johnson <matthew@anandabits.com> >>> wrote:

On Aug 18, 2017, at 6:29 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Fri, Aug 18, 2017 at 6:19 PM, Matthew Johnson <matthew@anandabits. >>>> > wrote:

On Aug 18, 2017, at 6:15 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Fri, Aug 18, 2017 at 09:20 Matthew Johnson via swift-evolution < >>>>> swift-evolution@swift.org> wrote:

Sent from my iPad

On Aug 18, 2017, at 1:27 AM, John McCall <rjmccall@apple.com> wrote:

>> On Aug 18, 2017, at 12:58 AM, Chris Lattner via swift-evolution < >>>>>> swift-evolution@swift.org> wrote:
>> Splitting this off into its own thread:
>>
>>> On Aug 17, 2017, at 7:39 PM, Matthew Johnson < >>>>>> matthew@anandabits.com> wrote:
>>> One related topic that isn’t discussed is type errors. Many
third party libraries use a Result type with typed errors. Moving to an
async / await model without also introducing typed errors into Swift would
require giving up something that is highly valued by many Swift
developers. Maybe Swift 5 is the right time to tackle typed errors as
well. I would be happy to help with design and drafting a proposal but
would need collaborators on the implementation side.
>>
>> Typed throws is something we need to settle one way or the other,
and I agree it would be nice to do that in the Swift 5 cycle.
>>
>> For the purposes of this sub-discussion, I think there are three
kinds of code to think about:
>> 1) large scale API like Cocoa which evolve (adding significant
functionality) over the course of many years and can’t break clients.
>> 2) the public API of shared swiftpm packages, whose lifecycle may
rise and fall - being obsoleted and replaced by better packages if they
encounter a design problem.
>> 3) internal APIs and applications, which are easy to change
because the implementations and clients of the APIs are owned by the same
people.
>>
>> These each have different sorts of concerns, and we hope that
something can start out as #3 but work its way up the stack gracefully.
>>
>> Here is where I think things stand on it:
>> - There is consensus that untyped throws is the right thing for a
large scale API like Cocoa. NSError is effectively proven here. Even if
typed throws is introduced, Apple is unlikely to adopt it in their APIs for
this reason.
>> - There is consensus that untyped throws is the right default for
people to reach for for public package (#2).
>> - There is consensus that Java and other systems that encourage
lists of throws error types lead to problematic APIs for a variety of
reasons.
>> - There is disagreement about whether internal APIs (#3) should
use it. It seems perfect to be able to write exhaustive catches in this
situation, since everything in knowable. OTOH, this could encourage abuse
of error handling in cases where you really should return an enum instead
of using throws.
>> - Some people are concerned that introducing typed throws would
cause people to reach for it instead of using untyped throws for public
package APIs.
>
> Even for non-public code. The only practical merit of typed throws
I have ever seen someone demonstrate is that it would let them use
contextual lookup in a throw or catch. People always say "I'll be able to
exhaustively switch over my errors", and then I ask them to show me where
they want to do that, and they show me something that just logs the error,
which of course does not require typed throws. Every. Single. Time.

I agree that exhaustive switching over errors is something that
people are extremely likely to actually want to do. I also think it's a
bit of a red herring. The value of typed errors is *not* in exhaustive
switching. It is in categorization and verified documentation.

Here is a concrete example that applies to almost every app. When
you make a network request there are many things that could go wrong to
which you may want to respond differently:
* There might be no network available. You might recover by updating
the UI to indicate that and start monitoring for a reachability change.
* There might have been a server error that should eventually be
resolved (500). You might update the UI and provide the user the ability
to retry.
* There might have been an unrecoverable server error (404). You
will update the UI.
* There might have been a low level parsing error (bad JSON, etc).
Recovery is perhaps similar in nature to #2, but the problem is less likely
to be resolved quickly so you may not provide a retry option. You might
also want to do something to notify your dev team that the server is
returning JSON that can't be parsed.
* There might have been a higher-level parsing error (converting JSON
to model types). This might be treated the same as bad JSON. On the other
hand, depending on the specifics of the app, you might take an alternate
path that only parses the most essential model data in hopes that the
problem was somewhere else and this parse will succeed.

All of this can obviously be accomplished with untyped errors. That
said, using types to categorize errors would significantly improve the
clarity of such code. More importantly, I believe that by categorizing
errors in ways that are most relevant to a specific domain a library
(perhaps internal to an app) can encourage developers to think carefully
about how to respond.

I used to be rather in favor of adding typed errors, thinking that it
can only benefit and seemed reasonable. However, given the very interesting
discussion here, I'm inclined to think that what you articulate above is
actually a very good argument _against_ adding typed errors.

If I may simplify, the gist of the argument advanced by Tino, Charlie,
and you is that the primary goal is documentation, and that documentation
in the form of prose is insufficient because it can be unreliable.
Therefore, you want a way for the compiler to enforce said documentation.
(The categorization use case, I think, is well addressed by the
protocol-based design discussed already in this thread.)

Actually documentation is only one of the goals I have and it is the
least important. Please see my subsequent reply to John where I articulate
the four primary goals I have for improved error handling, whether it be
typed errors or some other mechanism. I am curious to see what you think
of the goals, as well as what mechanism might best address those goals.

Your other three goals have to do with what you term categorization,
unless I misunderstand. Are those not adequately addressed by Joe Groff's
protocol-based design?

Can you elaborate on what you mean by Joe Gross’s protocol-based
design? I certainly haven’t seen anything that I believe addresses those
goals well.

However, the compiler itself cannot reward, only punish in the form of
errors or warnings; if exhaustive switching is a red herring and the payoff
for typed errors is correct documentation, the effectiveness of this kind
of compiler enforcement must be directly proportional to the degree of
extrinsic punishment inflicted by the compiler (since the intrinsic reward
of correct documentation is the same whether it's spelled using doc
comments or the type system). This seems like a heavy-handed way to enforce
documentation of only one specific aspect of a throwing function; moreover,
if this use case were to be sufficiently compelling, then it's certainly a
better argument for SourceKit (or some other builtin tool) to automatically
generate information on all errors thrown than for the compiler to require
that users declare it themselves--even if opt-in.

Bad error handling is pervasive. The fact that everyone shows you

code that just logs the error is a prime example of this. It should be
considered a symptom of a problem, not an acceptable status quo to be
maintained. We need all the tools at our disposal to encourage better
thinking about and handling of errors. Most importantly, I think we need a
middle ground between completely untyped errors and an exhaustive list of
every possible error that might happen. I believe a well designed
mechanism for categorizing errors in a compiler-verified way can do exactly
this.

In many respects, there are similarities to this in the design of
`NSError` which provides categorization via the error domain. This
categorization is a bit more broad than I think is useful in many cases,
but it is the best example I'm aware of.

The primary difference between error domains and the kind of
categorization I am proposing is that error domains categorize based on the
source of an error whereas I am proposing categorization driven by likely
recovery strategies. Recovery is obviously application dependent, but I
think the example above demonstrates that there are some useful
generalizations that can be made (especially in an app-specific library),
even if they don't apply everywhere.

> Sometimes we then go on to have a conversation about wrapping
errors in other error types, and that can be interesting, but now we're
talking about adding a big, messy feature just to get "safety" guarantees
for a fairly minor need.

I think you're right that wrapping errors is tightly related to an
effective use of typed errors. You can do a reasonable job without
language support (as has been discussed on the list in the past). On the
other hand, if we're going to introduce typed errors we should do it in a
way that *encourages* effective use of them. My opinion is that
encouraging effect use means categorizing (wrapping) errors without
requiring any additional syntax beyond the simple `try` used by untyped
errors. In practice, this means we should not need to catch and rethrow an
error if all we want to do is categorize it. Rust provides good prior art
in this area.

>
> Programmers often have an instinct to obsess over error taxonomies
that is very rarely directed at solving any real problem; it is just
self-imposed busy-work.

I agree that obsessing over intricate taxonomies is
counter-productive and should be discouraged. On the other hand, I hope
the example I provided above can help to focus the discussion on a
practical use of types to categorize errors in a way that helps guide
*thinking* and therefore improves error handling in practice.

>
>> - Some people think that while it might be useful in some narrow
cases, the utility isn’t high enough to justify making the language more
complex (complexity that would intrude on the APIs of result types,
futures, etc)
>>
>> I’m sure there are other points in the discussion that I’m
forgetting.
>>
>> One thing that I’m personally very concerned about is in the
systems programming domain. Systems code is sort of the classic example of
code that is low-level enough and finely specified enough that there are
lots of knowable things, including the failure modes.
>
> Here we are using "systems" to mean "embedded systems and
kernels". And frankly even a kernel is a large enough system that they
don't want to exhaustively switch over failures; they just want the static
guarantees that go along with a constrained error type.
>
>> Beyond expressivity though, our current model involves boxing
thrown values into an Error existential, something that forces an implicit
memory allocation when the value is large. Unless this is fixed, I’m very
concerned that we’ll end up with a situation where certain kinds of systems
code (i.e., that which cares about real time guarantees) will not be able
to use error handling at all.
>>
>> JohnMC has some ideas on how to change code generation for
‘throws’ to avoid this problem, but I don’t understand his ideas enough to
know if they are practical and likely to happen or not.
>
> Essentially, you give Error a tagged-pointer representation to
allow payload-less errors on non-generic error types to be allocated
globally, and then you can (1) tell people to not throw errors that require
allocation if it's vital to avoid allocation (just like we would tell them
today not to construct classes or indirect enum cases) and (2) allow a
special global payload-less error to be substituted if error allocation
fails.
>
> Of course, we could also say that systems code is required to use a
typed-throws feature that we add down the line for their purposes. Or just
tell them to not use payloads. Or force them to constrain their error
types to fit within some given size. (Note that obsessive error taxonomies
tend to end up with a bunch of indirect enum cases anyway, because they get
recursive, so the allocation problem is very real whatever we do.)
>
> John.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

anandabits · August 19, 2017, 7:25pm

Sent from my iPad

Sent from my iPad

Sent from my iPad

Joe Groff wrote:

An alternative approach that embraces the open nature of errors could be to represent domains as independent protocols, and extend the error types that are relevant to that domain to conform to the protocol. That way, you don't obscure the structure of the underlying error value with wrappers. If you expect to exhaustively handle all errors in a domain, well, you'd almost certainly going to need to have a fallback case in your wrapper type for miscellaneous errors, but you could represent that instead without wrapping via a catch-all, and as?-casting to your domain protocol with a ??-default for errors that don't conform to the protocol. For example, instead of attempting something like this:

enum DatabaseError {
  case queryError(QueryError)
  case ioError(IOError)
  case other(Error)

  var errorKind: String {
    switch self {
      case .queryError(let q): return "query error: \(q.query)"
      case .ioError(let i): return "io error: \(i.filename)"
      case .other(let e): return "\(e)"
    }
  }
}

func queryDatabase(_ query: String) throws /*DatabaseError*/ -> Table

do {
  queryDatabase("delete * from users")
} catch let d as DatabaseError {
  os_log(d.errorKind)
} catch {
  fatalError("unexpected non-database error")
}

You could do this:

protocol DatabaseError {
  var errorKind: String { get }
}

extension QueryError: DatabaseError {
  var errorKind: String { return "query error: \(q.query)" }
}
extension IOError: DatabaseError {
  var errorKind: String ( return "io error: \(i.filename)" }
}

extension Error {
  var databaseErrorKind: String {
    return (error as? DatabaseError)?.errorKind ?? "unexpected non-database error"
  }
}

func queryDatabase(_ query: String) throws -> Table

do {
  queryDatabase("delete * from users")
} catch {
  os_log(error.databaseErrorKind)
}

This approach isn't sufficient for several reasons. Notably, it requires the underlying errors to already have a distinct type for every category we wish to place them in. If all network errors have the same type and I want to categorize them based on network availability, authentication, dropped connection, etc I am not able to do that.

Sorry, how does the presence or absence of typed throws play into this?

It provides a convenient way to drive an error conversion mechanism during propagation, whether in a library function used to wrap the throwing expression or ideally with language support. If I call a function that throws FooError and my function throws BarError and we have a way to go from FooError to BarError we can invoke that conversion without needing to catch and rethrow the wrapped error.

But isn't that an argument *against* typed errors? You need this language-level support to automatically convert FooErrors to BarErrors *because* you've restricted yourself to throwing BarErrors and the function you call is restricted to throwing FooErrors. Currently, without typed errors, there is no need to convert a FooError to a BarError.

No, this is a misunderstanding of the point of the conversion. In that example, the point of performing a conversion is not because the types are arbitrarily chosen. It is because the initializer of BarError includes an algorithm that categorizes errors. It may place different values of FooError into different cases. What I am after is language-level support for categorizing errors during propagation and making the categories easily visible to anyone who looks at the signature of a function that chooses to categorize its errors. Using types and the initializer is one way (the most obvious way) to do this.

As mentioned above, it's difficult even internally to design a single ontology of errors that works throughout a library, so compiler support for typed errors would be tantamount to a compiler-enforced facility that pervasively requires this laborious classification and re-classification of errrors whenever a function rethrows, much of which may be ultimately unnecessary. In other words, if you are a library vendor and wrap every FooError from an upstream dependency into a BarError, your user is still likely to have their own classification of errors and decide to handle different groups of BarError cases differently anyway, so what was the point of your laborious conversion of FooErrors to BarErrors?

I am not suggesting that categorization be performed everywhere. What I am suggesting is that there are many cases where it can be done in a meaningful way. If you're writing a library and anticipate that there is no way to categorize errors that is meaningful to all users of your library you should not perform categorization. On the other hand, continuing with the line of examples we've already seen, if you're writing an internal networking library and intend to have similar recovery paths throughout your app categorization can be extremely useful. This is especially true as a team grows and you are trying to encourage common practices throughout the app.

It also provides convenient documentation of the categorization along with a straightforward way to match the cases (with code completion as Chris pointed out). IMO, making this information immediately clear and with easy matching at call sites is crucial to improving how people handle errors in practice.

Again, I don't see documentation as a sufficient argument for this feature; there is no reason why the Swift compiler could not extract comprehensive information about what errors are thrown at compile time without typed errors--and with more granularity than can be documented via types (since only specific enum cases may ever be thrown in a particular function).

I am not arguing that documentation alone is a sufficient argument. That said, while the compiler *could* do such things I don't think it would be a priority any time soon (core team, please correct me if I'm wrong here).

If the compiler is to support typed errors, then certainly it must check what errors are thrown. I'm just saying it doesn't require the additional human labor of typed error annotations to reap the desired documentation benefit; the compiler need only expose that information.

Sure, on this we can agree.

However, if there is no way to invoke an algorithm that categorizes errors during propagation without an annotation that specifies what that algorithm is. Using initializers and cases to perform categorization and a type annotation to drive the machinery seems like a reasonable approach.

On the other hand, this does make the assumption that an enum with a case per category is used for categorization. Perhaps some other solution that allowed for a distinct type (perhaps an existential) per category could be identified and may even be more flexible. I'm not sure what that would look like though.

···

Sent from my iPad

On Aug 19, 2017, at 2:16 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Sat, Aug 19, 2017 at 2:04 PM, Matthew Johnson <matthew@anandabits.com> wrote:

On Aug 19, 2017, at 12:43 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Sat, Aug 19, 2017 at 08:29 Matthew Johnson <matthew@anandabits.com> wrote:

On Aug 18, 2017, at 9:19 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Fri, Aug 18, 2017 at 8:11 PM, Matthew Johnson <matthew@anandabits.com> wrote:

On Aug 18, 2017, at 6:56 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

Error handling is an afterthought all too often. The value of making it immediately clear how to match important categories of errors should not be understated.

See, this is probably where I'm failing to understand you. Every library that has its own Error-conforming types offers an ontology of errors that, at least to its authors, make some sort of sense. At the call site, you can `catch` specific categories of errors or `switch` over specific errors. Yes, this can become a little annoying if your own classification of errors differs from the library authors' classification. However, I fail to see how typed errors makes this any better, other than that you'd `catch` only one type of error but have to `switch` over cases and then `switch` over the underlying error. Only now, you've introduced this issue where, for the library authors, FooErrors have to be reclassified into BarErrors, and then into BazErrors, and then into BooErrors--to what end? It seems only to accomplish the goal of making error handling not an afterthought by causing the compiler to make it more of a nuisance.

If the only use of this was to move from throwing independent types to throwing a single type with a bunch of cases you are right, there wouldn't be much value. However, the contract for individual functions in a library may vary and there are also errors from dependencies of the library that may also be thrown.

Let's assume for a moment that the error types of a library are carefully designed to offer meaningful categorization. Even then it is still very useful to know which categories may occur at a given call site. Yes, this is only documentation so we'll set it aside and not give it too much weight.

More importantly, if I depend on library X it may depend on library Y as an implementation detail which is not intended to be part of its API contract. As a user of X I should not need to be aware that Y even exists. I certainly am not going to import it and have access to its error types - I don't want a direct dependency on Y. Untyped errors are likely to lead to errors from Y leaking through the interface of X. They may be errors which I would have a meaningful recovery strategy if my code was able to properly understand them, but it won't because I my code doesn't know anything about Y. Yes, these are bugs in X but they are bugs that the language could help to prevent.

This is a good point which I hadn't considered. The compiler should warn if the public API of one library leaks non-stdlib third-party errors without re-exporting the third-party library. But again, this doesn't require typed errors.

As I have said before, none of this *requires* typed errors, they are just a natural way to approach the problem. Any solution that allows for categorization during propagation that can be matched at call sites would be acceptable.

I really believe language support of some kind is warranted and would have an impact on the quality of software. Maybe types aren't the right solution, but we do need one.

Deciding what categories are important is obviously subjective, but I do believe that libraries focused on a specific domain can often make reasonable guesses that are pretty close in the majority of use cases. This is especially true for internal libraries where part of the purpose of the library may be to establish conventions for the app that are intended to be used (almost) everywhere.

The kind of categorization I want to be able to do requires a custom algorithm. The specific algorithm is used to categorize errors depends on the dynamic context (i.e. the function that is propagating it). The way I usually think about this categorization is as a conversion initializer as I showed in the example, but it certainly wouldn't need to be accomplished that way. The most important thing IMO is the ability to categorize during error propagation and make information about that categorization easy for callers to discover.

The output of the algorithm could use various mechanisms for categorization - an enum is one mechanism, distinct types conforming to appropriate categorization protocols is another. Attaching some kind of category value to the original error or propagating the category along with it might also work (although might be rather clunky).

It is trivial to make the original error immediately available via an `underlyingError` property so I really don't understand the resistance to wrapping errors. The categorization can easily be ignored at the catch site if desired. That said, if we figure out some other mechanism for categorizing errors, including placing different error values of the same type into different categories, and matching them based on this categorization I think I would be ok with that. Using wrapper types is not essential to solving the problem.

Setting all of this aside, surely you had you had your own reasons for supporting typed errors in the past. What were those and why do you no longer consider them important?

My memory is certainly spotty, but as far as I can recall, I had no distinct reasons; it just seemed like a reasonable and "natural" next step that other people wanted for which I had no use case of my own in mind. Having seen the argumentation that there aren't very many use cases in general, I'm warming to the view that it's probably not such a great next step.

On Fri, Aug 18, 2017 at 6:46 PM, Matthew Johnson <matthew@anandabits.com> wrote:

On Aug 18, 2017, at 6:29 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Fri, Aug 18, 2017 at 6:19 PM, Matthew Johnson <matthew@anandabits.com> wrote:

On Aug 18, 2017, at 6:15 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Fri, Aug 18, 2017 at 09:20 Matthew Johnson via swift-evolution <swift-evolution@swift.org> wrote:

Sent from my iPad

On Aug 18, 2017, at 1:27 AM, John McCall <rjmccall@apple.com> wrote:

>> On Aug 18, 2017, at 12:58 AM, Chris Lattner via swift-evolution <swift-evolution@swift.org> wrote:
>> Splitting this off into its own thread:
>>
>>> On Aug 17, 2017, at 7:39 PM, Matthew Johnson <matthew@anandabits.com> wrote:
>>> One related topic that isn’t discussed is type errors. Many third party libraries use a Result type with typed errors. Moving to an async / await model without also introducing typed errors into Swift would require giving up something that is highly valued by many Swift developers. Maybe Swift 5 is the right time to tackle typed errors as well. I would be happy to help with design and drafting a proposal but would need collaborators on the implementation side.
>>
>> Typed throws is something we need to settle one way or the other, and I agree it would be nice to do that in the Swift 5 cycle.
>>
>> For the purposes of this sub-discussion, I think there are three kinds of code to think about:
>> 1) large scale API like Cocoa which evolve (adding significant functionality) over the course of many years and can’t break clients.
>> 2) the public API of shared swiftpm packages, whose lifecycle may rise and fall - being obsoleted and replaced by better packages if they encounter a design problem.
>> 3) internal APIs and applications, which are easy to change because the implementations and clients of the APIs are owned by the same people.
>>
>> These each have different sorts of concerns, and we hope that something can start out as #3 but work its way up the stack gracefully.
>>
>> Here is where I think things stand on it:
>> - There is consensus that untyped throws is the right thing for a large scale API like Cocoa. NSError is effectively proven here. Even if typed throws is introduced, Apple is unlikely to adopt it in their APIs for this reason.
>> - There is consensus that untyped throws is the right default for people to reach for for public package (#2).
>> - There is consensus that Java and other systems that encourage lists of throws error types lead to problematic APIs for a variety of reasons.
>> - There is disagreement about whether internal APIs (#3) should use it. It seems perfect to be able to write exhaustive catches in this situation, since everything in knowable. OTOH, this could encourage abuse of error handling in cases where you really should return an enum instead of using throws.
>> - Some people are concerned that introducing typed throws would cause people to reach for it instead of using untyped throws for public package APIs.
>
> Even for non-public code. The only practical merit of typed throws I have ever seen someone demonstrate is that it would let them use contextual lookup in a throw or catch. People always say "I'll be able to exhaustively switch over my errors", and then I ask them to show me where they want to do that, and they show me something that just logs the error, which of course does not require typed throws. Every. Single. Time.

I agree that exhaustive switching over errors is something that people are extremely likely to actually want to do. I also think it's a bit of a red herring. The value of typed errors is *not* in exhaustive switching. It is in categorization and verified documentation.

Here is a concrete example that applies to almost every app. When you make a network request there are many things that could go wrong to which you may want to respond differently:
* There might be no network available. You might recover by updating the UI to indicate that and start monitoring for a reachability change.
* There might have been a server error that should eventually be resolved (500). You might update the UI and provide the user the ability to retry.
* There might have been an unrecoverable server error (404). You will update the UI.
* There might have been a low level parsing error (bad JSON, etc). Recovery is perhaps similar in nature to #2, but the problem is less likely to be resolved quickly so you may not provide a retry option. You might also want to do something to notify your dev team that the server is returning JSON that can't be parsed.
* There might have been a higher-level parsing error (converting JSON to model types). This might be treated the same as bad JSON. On the other hand, depending on the specifics of the app, you might take an alternate path that only parses the most essential model data in hopes that the problem was somewhere else and this parse will succeed.

All of this can obviously be accomplished with untyped errors. That said, using types to categorize errors would significantly improve the clarity of such code. More importantly, I believe that by categorizing errors in ways that are most relevant to a specific domain a library (perhaps internal to an app) can encourage developers to think carefully about how to respond.

I used to be rather in favor of adding typed errors, thinking that it can only benefit and seemed reasonable. However, given the very interesting discussion here, I'm inclined to think that what you articulate above is actually a very good argument _against_ adding typed errors.

If I may simplify, the gist of the argument advanced by Tino, Charlie, and you is that the primary goal is documentation, and that documentation in the form of prose is insufficient because it can be unreliable. Therefore, you want a way for the compiler to enforce said documentation. (The categorization use case, I think, is well addressed by the protocol-based design discussed already in this thread.)

Actually documentation is only one of the goals I have and it is the least important. Please see my subsequent reply to John where I articulate the four primary goals I have for improved error handling, whether it be typed errors or some other mechanism. I am curious to see what you think of the goals, as well as what mechanism might best address those goals.

Your other three goals have to do with what you term categorization, unless I misunderstand. Are those not adequately addressed by Joe Groff's protocol-based design?

Can you elaborate on what you mean by Joe Gross’s protocol-based design? I certainly haven’t seen anything that I believe addresses those goals well.

However, the compiler itself cannot reward, only punish in the form of errors or warnings; if exhaustive switching is a red herring and the payoff for typed errors is correct documentation, the effectiveness of this kind of compiler enforcement must be directly proportional to the degree of extrinsic punishment inflicted by the compiler (since the intrinsic reward of correct documentation is the same whether it's spelled using doc comments or the type system). This seems like a heavy-handed way to enforce documentation of only one specific aspect of a throwing function; moreover, if this use case were to be sufficiently compelling, then it's certainly a better argument for SourceKit (or some other builtin tool) to automatically generate information on all errors thrown than for the compiler to require that users declare it themselves--even if opt-in.

Bad error handling is pervasive. The fact that everyone shows you code that just logs the error is a prime example of this. It should be considered a symptom of a problem, not an acceptable status quo to be maintained. We need all the tools at our disposal to encourage better thinking about and handling of errors. Most importantly, I think we need a middle ground between completely untyped errors and an exhaustive list of every possible error that might happen. I believe a well designed mechanism for categorizing errors in a compiler-verified way can do exactly this.

In many respects, there are similarities to this in the design of `NSError` which provides categorization via the error domain. This categorization is a bit more broad than I think is useful in many cases, but it is the best example I'm aware of.

The primary difference between error domains and the kind of categorization I am proposing is that error domains categorize based on the source of an error whereas I am proposing categorization driven by likely recovery strategies. Recovery is obviously application dependent, but I think the example above demonstrates that there are some useful generalizations that can be made (especially in an app-specific library), even if they don't apply everywhere.

> Sometimes we then go on to have a conversation about wrapping errors in other error types, and that can be interesting, but now we're talking about adding a big, messy feature just to get "safety" guarantees for a fairly minor need.

I think you're right that wrapping errors is tightly related to an effective use of typed errors. You can do a reasonable job without language support (as has been discussed on the list in the past). On the other hand, if we're going to introduce typed errors we should do it in a way that *encourages* effective use of them. My opinion is that encouraging effect use means categorizing (wrapping) errors without requiring any additional syntax beyond the simple `try` used by untyped errors. In practice, this means we should not need to catch and rethrow an error if all we want to do is categorize it. Rust provides good prior art in this area.

>
> Programmers often have an instinct to obsess over error taxonomies that is very rarely directed at solving any real problem; it is just self-imposed busy-work.

I agree that obsessing over intricate taxonomies is counter-productive and should be discouraged. On the other hand, I hope the example I provided above can help to focus the discussion on a practical use of types to categorize errors in a way that helps guide *thinking* and therefore improves error handling in practice.

>
>> - Some people think that while it might be useful in some narrow cases, the utility isn’t high enough to justify making the language more complex (complexity that would intrude on the APIs of result types, futures, etc)
>>
>> I’m sure there are other points in the discussion that I’m forgetting.
>>
>> One thing that I’m personally very concerned about is in the systems programming domain. Systems code is sort of the classic example of code that is low-level enough and finely specified enough that there are lots of knowable things, including the failure modes.
>
> Here we are using "systems" to mean "embedded systems and kernels". And frankly even a kernel is a large enough system that they don't want to exhaustively switch over failures; they just want the static guarantees that go along with a constrained error type.
>
>> Beyond expressivity though, our current model involves boxing thrown values into an Error existential, something that forces an implicit memory allocation when the value is large. Unless this is fixed, I’m very concerned that we’ll end up with a situation where certain kinds of systems code (i.e., that which cares about real time guarantees) will not be able to use error handling at all.
>>
>> JohnMC has some ideas on how to change code generation for ‘throws’ to avoid this problem, but I don’t understand his ideas enough to know if they are practical and likely to happen or not.
>
> Essentially, you give Error a tagged-pointer representation to allow payload-less errors on non-generic error types to be allocated globally, and then you can (1) tell people to not throw errors that require allocation if it's vital to avoid allocation (just like we would tell them today not to construct classes or indirect enum cases) and (2) allow a special global payload-less error to be substituted if error allocation fails.
>
> Of course, we could also say that systems code is required to use a typed-throws feature that we add down the line for their purposes. Or just tell them to not use payloads. Or force them to constrain their error types to fit within some given size. (Note that obsessive error taxonomies tend to end up with a bunch of indirect enum cases anyway, because they get recursive, so the allocation problem is very real whatever we do.)
>
> John.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution