Handling unknown cases in enums [RE: SE-0192]

jrose · March 8, 2018, 6:53pm

I don't think it is, but I'll admit to have been very close to this problem for a long time, so it may be that that's a more useful spelling for explaining it to people.

mklbtz · March 12, 2018, 11:10pm

It seems like this discussion has moved away from the original post's suggestion to build a more general-purpose pattern-matching operator like #unknown. I liked this a lot because would allow us to be so much more specific. I feel like needing to future-proof your code would demand a lot of specificity.

Current changes in the revision PR say "it would be surprising to have such a pattern but keep the restrictions described in this proposal," but I'll admit I'm not sure what that means.

Have we decided against this suggestion? All the other models for this feature seem like an undesirable compromise to me, but I may be missing some crucial details. This thread has gotten quite long itself

If we haven't decided against the concept altogether but didn't like the original spelling, I humbly submit #invisible for consideration. I noticed that the original post says "This whole discussion is happening because the ABI stability work is introducing a new concept to enums - invisible members." It struck me that "invisible" captures the idea as well as "unknown" does.

Cheers
Hope I'm not adding to the noise

xwu · March 12, 2018, 11:51pm

I believe we have.

jrose · March 13, 2018, 12:15am

I cleaned up the section on patterns a little in the version I have locally (not ready to push yet). Here's what it's going to look like:

However, this produces potentially surprising results when followed by a case that could also match a particular input. Because unknown acts as a catch-all, the input (.thoughtItWasDueNextWeek, true) would result in case 2 being chosen rather than case 3.
switch (excuse, notifiedTeacherBeforeDeadline) {
case (.eatenByPet, true): // 1
  // …
case (#unknown, true): // 2
  // …
case (.thoughtItWasDueNextWeek, _): // 3
  // …
case (_, false): // 4
  // …
}
The compiler would warn about this, at least, since there is a known value that can reach the unknown pattern.

As a top-level case, unknown must go last to avoid this issue. However, it's not possible to enforce the same thing for arbitrary patterns because there may be multiple enums in the pattern whose unknown cases need to be treated differently.

So the arguments against are a little weaker than they used to be, but it's still not the direction I want to go with this proposal, which has to get something that works and works simply in the common case.

My own problem with #invisible is that future cases aren't exactly "invisible". They're not visible, sure, but neither are the "invisible" because they're just…not there yet. Chris's characterization of this as "invisible" is therefore not something I'd suggest baking into the language. But I'll add it to the list of "other suggested names".

Nevin · March 13, 2018, 12:30am

Returning to this after some time for reflection, I wonder if the warning behavior is really as critical as we have been assuming.

Perhaps we should simply introduce the ability for frameworks to declare their enums as (non)frozen, and later if there is a hue and/or cry for the warning then we can determine the best shape and spelling with actual experience under our collective belt.

mklbtz · March 13, 2018, 3:18am

Oh yeah I see how that could get pretty messy

QuinceyMorris · March 13, 2018, 5:40am

I think I just figured this out (after getting it wrong several times)…

In the original pitch (the very first post of this topic), the subpattern #unknown was intended to match only unknown cases. That would have been fine as far as it goes, but here’s the point: There'd still need to be a final catch-all that matched everything that wasn't matched already (which is not the meaning of case #unknown:, in that definition of #unknown).

The "match-everything-else" catch-all is needed because otherwise you can't get the diagnostic messages right (for a non-frozen enum).

I think that was the point Jordan got, and has been insisting we get too:

If the switch was previously (in Swift 4) exhaustive, then adding [hypothetically] just:

case #unknown: // matches only statically unknown cases
    …

would, at some future compilation, after more cases had been added to the enum, produce an error that the switch wasn't exhaustive. Most people don't want that to be an error.

Adding:

case _: // aka 'default:', matches anything else
    …

would suppress all warnings about unhandled known cases, now and in the future. Nobody wants no warnings at all.

Adding [hypothetically] both:

case #unknown: // matches only statically unknown cases
    …
case _: // aka 'default:', matches anything else
    …

would produce a warning that case _: cannot be reached, if the switch listed currently-known cases exhaustively. The only way to get rid of the warning would be to remove the one of the cases, leading to one of the other unwanted scenarios.

Changing the semantics of the #unknown subpattern to match "everything unmatched so far" leads to ambiguities between #unknown and _ in compound match cases (as explained in the most recent posts).

The only correct approach combines those two cases of #3 into a single [hypothetical] case:

case #unknown, _:

which warns if and only the switch fails to enumerate some (presumably “new”) cases known at the time of compilation. This is the desired warning.

Since this catch-call is required regardless of other uses of a [hypothetical] #unknown subpattern, it may as well have a unique spelling:

unknown default:

which is where the proposal currently stands.

Does that sound right?

MutatingFunk · March 13, 2018, 11:00am

I'm not sure what you mean by this. Could you give an example?

If the issue in question is just ensuring a warning is emitted on incorrect usage, we could just make the behaviour clearer by using a different name:

#warningDefault
#exhaustiveDefault
#fallback
#other
#unknown _
unknown _
#exhaustive _
exhaustive _

Admittedly, it's quite a difficult concept to express succinctly, but we shouldn't write the feature off just because using a (repurposed) spelling makes the semantics unclear.

Karl · March 13, 2018, 2:17pm

Okay, so #unknown cases actually mean two things:

New cases due to dynamic behaviour (non-frozen enum)
Private cases (which are the only "statically unknown cases")

In the first case, it is not possible to write an exhaustive switch over the enum without a default. New cases can always be added beyond what we know today. The library might be updated in 2 years and somebody is still running your old executable.

The second case also isn't really plausible without a default. You don't know about these cases - that's why they're private. You shouldn't have special knowledge for them outside of a regular default.

Also, switches are just a bunch of if...else... under the hood. Making #unknown match any specific list of cases would kill performance if it ever came up (which it might... and maybe even often, who knows?). I don't think the version in the first post is really plausible - It needs to be a catch-all so that it can be modelled as an else.

Just one more point: #unknown is not a "thing". It's what you have left when all known things failed to match. You should not be able to write if case(#unknown) = someValue, or catch #unknown, in the same way that this is not valid:

switch someValue {
  case #unknown: // do something
  default: // huh?
}

VladimirS · March 13, 2018, 3:36pm

[Warning: Long message]
As I undertand(correct me if I'm wrong), it was decided by core team that only Apple's frameworks that are linked at run
time(and such) will have non-frozen enums. 'Usual' Swift enums declared inside or outside the module in the same
application - will have no(at least for now) way to specify if enum is frozen/nonfrozen, so such enums will work as
currently (i.e. implicitly frozen).

So, usually we'll have 'unknown default:' only for SDK's enums, if we want compiler help us to keep it exhaustive if new
cases are introduced in new version of iOS, for example.
There were examples in that thread when it is reasonable to have exhaustive switch over SDK's non-frozen enum value.

Do you want to miss(at compilation time) the new case in SDK enum with having only 'default:' ? I don't. And you have to
use 'default:' (or suggested 'unknown default:') in switch over non-frozen enum.
So I believe we really need such a warning and so some kind of 'unknown default:' in switch over non-frozen enum.

The problem with naming, as I see it, is that name for such 'special default:' should include these meanings:

Future "public"(usual) cases, that can appear at run-time (our app is running on new version of iOs with added enum
case). "unknown" at compile time, "non-hidden" for us.
Future "private"(if we'll have them) cases, that can appear at run-time (our app is running on new version/update of
iOs with added private enum case). "unknown" at compile time, "hidden" for us.
New "public" cases, that can appear at compile-time (our app is compiling with updated 3rd party source file with
added enum cases). Such cases are "known" by compiler and "non-hidden" for us, but we want to add specific branches for
new cases into our switch later, not right now. So this 'unknown default' branch will be executed for these new cases
for now.
New/existed "private" cases, that can appear at compile-time (our app is compiling with updated 3rd party source file
with added private enum cases). Such cases are "known" by compiler but "hidden" for us. If we already have 'unknown
default:' in our switch - it will be 'used' to process such new private cases(with generating of the warning). [see
thoughts about 'private:' branch below]

All those cases are unknown(or we don't want to know) at the moment of writting the code. So probably "unknown" is a
reasonable word(if we don't have better) from this point of view.

FWIW Some thoughts on 'private' cases...

It seems that (in future) if enum have 'private' cases - we are loosing the compiler support to keep switch exhaustive
over such enum, because we have to use 'default:'/'unknown default:' even if iterated each public case in switch.
And if it is expected that many SDK's non-frozen enums will have private cases - then 'unknown default:' could be
useless. I mean that probably we need to think about 'private' cases now, together with unknown/future cases.

Even for Swift enums we have a question if we need some new special keyword describe "private" case like:

enum MyEnum {
case a, b, c
private case x, y, z
}

// exhaustive switch over public cases, but will not compile: must have a branch like 'default:' for 'private' cases.
switch myenum {
case .a : ..
case .b : ..
case .c : ..
}

// exhaustive switch, no warning/error, but 'default' will not give a compiler chance to warn you about
// new public cases appeared in MyEnum in future
switch myenum {
case .a : ..
case .b : ..
case .c : ..
default: ..
}

// exhaustive switch, will always warn because 'unknown default:' is known to be reachable with private cases
// ?? probably, if this is possible, we should not warn in 'unknown default' if reachable ONLY with private cases
switch myenum {
case .a : ..
case .b : ..
case .c : ..
unknown default: ..
}

// exhaustive switch, no warnings now, will error if new public case exists
switch myenum {
case .a : ..
case .b : ..
case .c : ..
private: ..
}

I'm not sure how this can work with non-frozen extenal C enums. If we can write

// exhaustive switch
switch external_enum {
case .one : ..
case .two : ..
private : ..
unknown default: ..
}

, and then at runtime we have some unknown(future) external_enum value sent here - is it new 'private' value or is it
new 'public' value? As I understand - we have no information about this. So then what branch should be called "private"
or "unknown" ?..

The best solution I can think of now is to

allow 'private:' branch for 'standard' Swift enums, and allow 'default:'/'unknown default:' to be used in the same
switch.
and
disallow 'private:' branch for external non-frozen(C) enums, all new private and public cases will be processed in the
same 'unknown default:' branch.
OR

don't introduce 'private:' branch but don't warn if 'unknown default:' is reachable only with private cases.

This means 'unknown default:' for non-frozen enum will handle both: new(future) cases AND all private cases. This raises
a question if we can write any meaningful code in that branch, as we usually don't interested in private cases(or handle
them all in specific way) but could be very interested in new public cases(give a warning to the user, for example) -
but we can't distinguish them. But we still want a warning produced by 'unknown default:' in case of new known cases in
compilation time...

Thoughts?

Thanks for reading this.
Vladimir.

DeFrenZ · March 13, 2018, 4:47pm

My understanding was that this was more about imported C enums, which Apple SDK ones are.

QuinceyMorris · March 13, 2018, 7:07pm

"Known cases" are discriminants that can be enumerated in a switch at compile time. That excludes [hypothetical] private cases, and cases that might exist in the future but don't exist at the time of compilation.

"Unknown cases" are discriminants that might be encountered at run time, other than the discriminants that were enumerable at compile time.

Specifically, if a specific switch enumerates X discriminants explicitly out of a total of Y discriminants that could have been enumerated at compile time, the compiler knows that there might be a total of Z discriminants at run-time, where X <= Y <= Z, though the value of Z isn't known until run-time (just like array.count isn't known till run-time, but the compiler can translate array[i] just fine).

As I understand it, the original proposal was that #unknown should match a total of Z-Y discriminants. The current proposal is that unknown default: should match a total of Z-X discriminants.

Which is to say, I suppose, that #unknown is a "thing", in the same way that array[i] is a thing, even though it may refer to something that doesn't exist (yet) from the compiler's point of view.

At least, that's my mental picture.

Karl · March 13, 2018, 7:41pm

Okay, let me draw some inferences from this:

The value of #unknown is context-dependent (the value X). Maybe some of your modules were written against CoolLibrary 1.0, and others against version 1.5. Wannabe-exhaustive switch statements within each module will handle different cases. Even within the same module, you might be migrating bits at a time.
Outside of a context where you try and handle other cases (e.g. if case (42, #unknown) = foo {), #unknown cannot have meaning - which cases should we take as X?
Within the context of a switch statement (i.e. the only place #unknown makes sense), it behaves indistinguishably from a default or _ (except for diagnostics)

If it takes different values in each individual switch statement, it is not it's own "thing".

John_McCall · March 13, 2018, 7:47pm

#unknown is _ with a warning if there's a statically-known case that would reach it. That's it.

It has a meaning in things like if case, it's just not a useful meaning because it can only possibly fail to warn if an enum has no statically-known cases. But that's still a meaning.

QuinceyMorris · March 13, 2018, 8:33pm

Yes, it is context-dependent in that way, but so is subpattern _, or case _:, or default:, even for a frozen enum. Are they not "things"?

I wasn't redesigning the feature, just following the original pitch:

At some point during this topic, #unknown morphed into the "oxymoronic" thing, but only some of the time, and only for some people.

The point is, this later #unknown is oxymoronic (well, let's just say "confusing and potentially ambiguous") only as a pattern matching operator that can appear anywhere. As a final catch-all, it's the behavior everyone wants (hence unknown default:).

jrose · March 13, 2018, 9:03pm

FWIW, this thread is not the original pitch; it's a spin-off from a proposal (mine) that was reviewed without any form of unknown, which was then revised and brought back for informal discussion in a pull request. unknown was always never going to produce errors; it must not in order to maintain source stability. Everything else falls out from that, including John's very simple explanation of its behavior.