Handling unknown cases in enums [RE: SE-0192]

QuinceyMorris · March 1, 2018, 12:32am

So, the current Swift 4 behavior is a done deal. In the error scenario we are talking about (a switch that becomes non-exhaustive when an external library is replaced), the Swift 4 switch just crashes itself if an unknown enum value is unhandled.

I was suggesting that in Swift 5, the switch could instead throw an Error. Since this has an automatic compile-time side-effect, that the programmer must provide code to either catch or eliminate the error, no new syntax is needed. (Either the unknown case is handled in the catch block, or the programmer adds a “default:” to a switch somewhere.) This doesn't seem complicated to me.

If you don’t like that solution, then I think it’s still not complicated. Just make it orthogonal and move on. Here’s what I mean:

We currently have the _ syntax for matching “anything”. In Swift 5, make that mean “any statically-known case” instead. We currently have the default syntax for matching “anything”. In Swift 5, make that mean “any known or unknown case”. So we would have this:

switch expr { // #1
… // handle some stuff
case _: … handle everything else we know about right now …
}

or:

switch expr { // #2
… // handle some stuff
default: … handle everything else including unknowns …
}

or:

switch expr { // #3
… // handle some stuff
case _: … handle everything else we know about right now …
default: … handle just unknowns, since everything else is already handled …
}

Scenario #1 is the only one that produces a compilation error, and only in the case of an external non-exhaustive enum. The others are the different ways of handling unknown cases.

For orthogonality, we would also have any combination of patterns similar to these:

case (match1, _): … 
case (match1, default): … 
case (_, match2): … 
case (default, match2): … 
case (_, _): … // same as 'case _:'
case (_, default): … 
case (default, _): … 
case (default, default): …  // same as ‘default:’

In every case, the distinction between _ and default tells you what’s supposed to happen to unknown cases, and there’s basically no new syntax (except for default as a pattern element, which will likely rarely get used).

This is easier for me as a client programmer, because the semantics are obvious and the syntax is predictable enough. AFAICT it covers all the possible execution flows, too. Why does it have to be more complicated or obscure?

Chris_Lattner3 · March 1, 2018, 5:39am

Hi Jordan,

I believe that you've agreed that it makes sense for the top level form to eventually be supported in nested positions as a pattern, e.g.:

switch foo() {
case (1, .A), (2, .B), (3, .C): ....
case (_, .A), (_, .B): ...
case (_, .C): ...
case (_, unknown default): ...
}

conceptually makes sense. Once you do that, this concept makes perfect sense (though would be extremely unusual to see in practice!!) in an if case:

  if case (42, unknown default) = foo() {

-Chris

Chris_Lattner3 · March 1, 2018, 6:10am

(sorry, just catching up on this thread):

There are two issues here, the semantic decision and the syntax decision. On the semantic decision, the pivot point is whether you want this to be accepted, assuming the enum is resilient and has A+B cases today.

switch foo() {
case (.A, .A), (.B, .B): ...
case (.A, .B), (.B, .A): ...
unknown default: ...  // or "unknown case:", or any other spelling.  This section is spelling invariant.
}

The question is whether it is desirable to allow this or not. Jordan is of the opinion that allowing such a thing is potentially confusing and should be disallowed: the only unknown case construct that should be allowed is if you're switching on an enum directly.

I'm pretty strongly of the opinion that there is no possibility of confusion here - this is simple composition on the behavior of default. I'm also pretty strongly of the opinion that making switch start to have magic behavior for one form of pattern is unprecedented and a bad idea. In my perspective, we should allow this, because in addition to tuple patterns (shown above) we support paren patterns and hope to support additional patterns in the future (e.g. potentially general nominal type destructuring), and enums can occur in nested positions.

Nested patterns with resilient enums in them would be obscure, but that is what also make the value of preventing them tiny, even if preventing them was decided to be a good thing.

Once we decide on the semantic direction, the syntax decision sort of falls out. If you follow Jordan's model, we've decided that "switch over enums" is a special form with special behavior and there is somewhat stronger argument to provide (currently unprecedented) new grammar productions for "case", given that this is a new control flow form. While "unknown case:" still doesn't make sense to me, the argument that this is a special kind of control flow statement makes room for the possibility of adding special forms that only occur in it, and explains why this could never make sense in if case.

If you follow my preferred design, then we're really just talking about the existing default semantics with a minor tweak to warning behavior. This argues strongly for a modifier of some sort on default (perhaps an attribute, perhaps a contextual keyword like unknown, whatever) and it strongly implies that we accept this in nested positions. In the core team discussion, it was observed that default: is just a synonym for case _: in patterns, so it would make sense to allow default as a verbose pattern specifier (allowing someone to write case default: if they really wanted). This allows default to work in nested positions just like _ currently does, and directly allows the unknown default syntax (however it is spelled) to work in nested productions, and thus in if case.

I think that there is another important meta point that is worth discussing here, around the problem of "potential for confusion" that seems to be motivating this entire discussion. Potential for confusion is an important factor in design, but in my opinion, any code using "unknown default" on a switch over a tuple or other enum-containing type will not suffer from it, even if the person reading the code didn't write it in the first place.

In my opinion, in practice the behavior of such code will either 1) be immediately clear to a reader, or 2) be immediately googlable to someone who wasn't familiar with unknown default. I don't think there is any reasonable likelihood that someone would look at the code, think they understand what it does and be wrong. Potential for confusion is not about "does everyone immediately know any syntax", it is much more of "does someone look at syntax and think they know what it does ... but are incorrect".

It perhaps isn't obvious, but there are highly precedented examples of this in Swift. This includes the much debated .. vs ... vs ..< operator discussion from the Swift 1 days. The point isn't that everyone will immediately know what ..< does, but if they encounter it and don't know what it is, then they know they don't know what it is. In contrast, people frequently encountered case 1..4 and thought they knew what it was, but were surprised when it didn't include 4. Use of ..< has defined away this problem.

Another example is the decision to allow user defined operators, and encouraging people to define new operators when they are doing something semantically different than the builtin ones. If someone encounters an <<=>> operator in someone else's code, they may not know what it is, but at least they aren't tricked into thinking it is something familiar. Compare this to the C++ design of forcing people to overload the existing set of operators, which means that you can encounter code using the + operator and have it actually be completely different than the standard ring algebra operations. The use of shift left for I/O is one egregious example of this, particularly because the precedence of it is wrong :-)

In any case, my personal preference is for simple and compositional designs that eschew special cases. I've been convinced that providing a solution for this problem is important, but I don't think it is sooo important that we'd warp the base language to solve it, introducing new control flow forms, new grammar productions, and new limitations all to prevent a perception of possible confusion in weird cases that are unlikely to arise in practice anyway.

-Chris

Nevin · March 1, 2018, 6:54am

Here’s something I don’t recall being discussed:

Should the warning be raised when the unknown case / default is known to be reachable, or when it *isn’t* known to *not* be reachable?

There are situations where a switch is in fact exhaustive, covering all known cases, but the compiler is unable to prove that. Here is a simple example which today gives the error “switch must be exhaustive”:

enum Bar { case int(Int) }

func foo(_ x: Bar) {
  switch x {
  case .int(let y) where y > 0: break
  case .int(let y) where y < 1: break
  // error: switch must be exhaustive
  }
}

If Bar were from a binary library and non-frozen, then after we add an unknown case / default to that switch should the compiler give a warning?

Or should it still give a hard error as it does today, since it can’t prove that the switch is exhaustive but it does know there aren’t any unhandled *cases*?

Or something else?

• • •

Well obviously that is the bat-signal operator, and it summons Bruce Wayne.

masters3d · March 1, 2018, 6:35pm

Let's say that case default was to be allowed, then are you suggesting case default: should function like the proposed catch all but with warnings unknown default: and can be used in every switch? Or Are you saying that case default: would be the same as default: and case _:?

Chris_Lattner3 · March 1, 2018, 10:40pm

^ this :-)

-Chris

gwendal.roue · March 2, 2018, 8:05am

I want to add that today, case _: is equivalent to case let x:, which means that case _: can only catch known valid values.

case let v is handier than default: is some situations like:

// totally made-up
switch f() {
    case 1: print("1")
    case let x: print("\(x) (not 1)")
}

Without case let x:, one would have to use a temp variable:

let x = f()
switch x {
    case 1: print("1")
    default: print("\(x) (not 1)")
}

In this example, I just want to emphasize why case _: should only match valid cases. IMHO, it's conceptually the same _ as in items.map { _ in 1 }, that is to say a placeholder for a ignored but valid value.

EDIT: my last sentence makes no sense. In items.map { _ in 1 }, _ can be anything, even an unknown value. Does this mean that case _: should match unknown cases, then?????? Sorry for the confusion.

masters3d · March 2, 2018, 2:36pm

case _ matches anything just like default: even unknowns. As a pattern _ also matching everything.

You do bring up a good point about case let x:. I don’t know if it should match everything including unknowns in enums.

gwendal.roue · March 2, 2018, 3:15pm

If I would try to make a sentence that has any meaning, it would be: make case let x: use(x) match the same values as case _: doStuff().

jrose · March 2, 2018, 6:17pm

Again, this does not make sense because unknown default-or-whatever is a catch-all. It's true that this match does not always succeed, but there is no way to prevent it from producing a warning.

John_McCall · March 2, 2018, 6:39pm

I mean, it makes sense compositionally, but I have no idea why you would ever do it. Someone created an enum with no public cases and you want to get a warning if they ever make one public?

(And to be clear for other people reading along, we don't currently support non-public cases in Swift enums, which isn't expected to change in Swift 5.)

jrose · March 2, 2018, 7:10pm

Nevin:

enum Bar { case int(Int) }

func foo(_ x: Bar) {
  switch x {
  case .int(let y) where y > 0: break
  case .int(let y) where y < 1: break
  // error: switch must be exhaustive
  }
}
If Bar were from a binary library and non-frozen, then after we add an unknown case / default to that switch should the compiler give a warning?

Or should it still give a hard error as it does today, since it can’t prove that the switch is exhaustive but it does know there aren’t any unhandled *cases*?

Interesting. You're really talking about this:

func foo(_ x: Bar) {
  switch x {
  case .int(let y) where y > 0: break
  case .int(let y) where y < 1: break
  unknown: break
  }
}

I think the only sensible answer is that you'd get a warning, since the compiler doesn't know that .int is fully handled. After all, we wouldn't want to allow something to slip through if you made a typo here (e.g. < 0 instead of < 1). And for the record…

func foo(_ x: Bar) {
  switch x {
  case .int(let y) where y > 0: break
  case .int(let y) where y < 1: break
  default: break
  }
}

…this will not produce any diagnostics, just like today.

Nevin · March 2, 2018, 7:40pm

So the proposed semantic is that unknown raises a warning, unless the compiler can prove it is unreachable. Makes sense, and glad to have that clarified.

Justin_Jia · March 2, 2018, 9:52pm

Can we use default and unknown together? I saw conflicting opinions about this in this thread.

If yes, it should still produce an error.
If no, I don’t think it should produce any warning.

QuinceyMorris · March 2, 2018, 10:57pm

It seems to me the logical thing to do would be to have case _ retain its current and proposed new meaning ("match all remaining known cases"), and to re-define default: to mean "match all remaining known and unknown cases"). This recycled default: would replace unknown case: in the proposal.

I regard it as logical because default: is obviously a catch-all, while every other use of case <something> matches known cases.

Edit: And case let x: would do the right thing, too, I think.

~~Or course, both case _: and default: would continue to mean the same thing for switches over frozen enums and non-enum expressions.~~

Edit: Well, I managed to say the difference backwards, and to misidentify the scenario that produces the warning. With that straightened out, I see why your solution is correct.

~~I'm suggest this because unknown case: seems confusing. It can be read as saying it won't match known cases, but it will. (There is already confusion on this point.)~~

If default: cannot be re-used instead of unknown case:, I'd suggest other case: or any case:, which may be clearer and have the virtue of being even shorter (which, according to the proposal, is a consideration in choosing the spelling).

Edit: And taking the above into account, I think unknown case: is the correct spelling too. Although it actually matches "everything else", it's intended for use where currently-known cases are all enumerated, eliminating the warning. Then, unknown cases are all that's left to match.

jrose · March 2, 2018, 11:56pm

Not in the form proposed today, no. The arguments mentioned in the proposal still apply.

We are not going to change the meaning of default or case _ or case let x in this proposal. That would be way too source-breaking. It's also not correct that they only match known cases. They all match any values—present or future, public or private.

Chris_Lattner3 · March 4, 2018, 1:32am

I don't know what you mean. Of course "unknown default" matches everything - its runtime behavior is the same as default. "case (42, default):" doesn't always match because the first value may not be 42.

jrose · March 5, 2018, 8:18pm

If you write if case (42, unknown default) = foo() {, the compiler will always emit a warning, because it's reachable with a known case. (Except as John said in the rare case where the enum has no public cases but is not frozen.) There's no advantage to writing that over if case (42, _) = foo() {.

It sounds like I'm the only one who wants the narrow view of "only matches enum values directly", so that's that. The last thing then is the spelling. I don't think that's a foregone conclusion, so let's see what people are thinking:

unknown:
unknown default:
default unknown:
default(unknown):
Something else?

0 voters

(Note that I had previously ruled out plain unknown: because of the potential conflict with labeled break/continue, but members of the core team pointed out that we could detect that case pretty easily if it ever came up, and it's exceedingly unlikely to come up anyway. So it's back on the table. unknown case, on the other hand, is out, since the correspondence with enum case declarations isn't there.)

This poll is not binding on either me or the core team, and is not the only way to give feedback. But I wanted to make it easy for people to weigh in, the people who've been watching the discussion and pressing "Like" but haven't been talking.

Chris_Lattner3 · March 6, 2018, 5:49am

I agree that 'unknown default' isn't useful in if/case except as an oddity that composes out of it. That said, I'd also like to point out that 'unknown' isn't really a great word to be involved here in any case. Something more verbose and communicative is appropriate for this concept.

Nevin · March 6, 2018, 9:53am

If we are going to have a poll—even a non-binding one—please, please, please let’s have it be approval-voting style, aka “check all options you support”, or score-voting style, aka “rate each option on a scale”. The flaws of choose-one voting are well known, and we can do better.

I strongly disagree. The wording “unknown case” refers to both the fact that an unknown enum case is involved at some level, *and also* it is itself a pattern-matching case that exists to handle the unknown. Additionally, at a purely English-language level, it means “In case you get something unknown, do this”.

The word “case” is entirely appropriate, meaningful, and precedented.

We use “case” for every other pattern-matching line in a switch, whether or not an enum is the subject, and it would be highly consistent to use “case” for this new line as well.