Allowing @unknown in arbitrarily nested patterns

typesanitizer · April 7, 2021, 3:22pm

Every pattern I ever wrote is here within this walls
I'm sorry, language nerds, but I'm blocking out your calls
I've had my evolution pitch, I don't need something new
I'm afraid of what I'll implement if I follow you

Into the unknown
Into the unknown
Into the unknown

(With apologies to Kristen Anderson-Lopez and Robert Lopez.)

Motivation

Today, Swift supports @unknown default: switch cases. Here's how TSPL describes the attribute's usage:

Apply this attribute to a switch case to indicate that it isn’t expected to be matched by any case of the enumeration that’s known at the time the code is compiled.

This is helpful in making sure that all cases that are known at the time of writing are covered.

However, @unknown default: only works in the outermost context of a switch statement.

This makes it more cumbersome to pattern-match on non-frozen enums if the value is nested; you need to create another switch statement to use @unknown default:.
This poorly affects the ergonomics of potential language features such as enum cases with non-public cases. (This point was brought up by @stephencelis.)

Proposed Solution

We allow "unknown patterns" in arbitrary pattern positions inside a case item in switch statements.

// Module M with library evolution
public enum E { case a; case b }

// Module N
import M

func f(_ e: E) {
  switch (e, e) {
  // note: add missing case '(.a, .a)'
  // note: add missing case '(.b, .a)'
  // note: add missing case '(.a, .b)'
  // note: add missing case '(.b, .b)'
  case (@unknown _, @unknown _): ()
  }
}

We propose the spelling @unknown _, as that reminds the reader that it has the semantics of both @unknown default (warnings will be issued if known cases are not matched explicitly) and of wildcard patterns _ (it will match any value).

Detailed Design

(This section will be fleshed out in the full proposal, with changes to the grammar and so on.)

Pattern matching crash course

Swift's switch statement is in the general case "2 dimensional" (we can ignore where clauses here, as they are not relevant for checking pattern exhaustivity, also ignore default for now):

switch subject_expression {
case pat_11, ... , pat_1A: stmt1
case pat_21, ... , pat_2B: stmt2
...
case pat_f1, ... , pat_fZ: stmtf
}

Each individual pattern in this "pattern grid" such as pat_11 and pat_fZ is a "case item". Each such pattern is actually a little AST of more fundamental patterns, such as literal patterns, tuple patterns, binding patterns and more. (See The Swift Programming Language: Redirect
for more details.)

The core idea is the compiler thinks of these patterns as "spaces" (think: sets) and switch exhaustivness checking reduces to checking if the "total space" (dictated by the type of the subject expression) is covered by the union of the spaces formed by the individual patterns. The way this is done is that all spaces are subtracted from the total space: if there is something left over, the switch is not exhaustive, and the compiler suggests fix-its to cover the leftover space.

Checking unknown patterns

The change with this pitch is that while iterating through case items for checking switch exhaustivess, the compiler also does "unknown checking"; if it sees a case item P that has @unknown _ nested somewhere inside its little pattern AST, it computes the pattern space for P, and from that, it subtracts the pattern spaces for all the case items that have been processed so far (excluding P, since P is being processed). If the difference is non-empty, it issues a warning and provides fix-its as before.

This approach is almost identical to what is done today for issuing diagnostics with @unknown default:. The chief difference is that today, the compiler does "unknown checking" once after it has processed all case items, whereas the addition of @unknown _ sub-patterns means that it needs to do "unknown checking" K times where K is the number of case items which contain at least one @unknown _ nested somewhere inside.

Here are some more complex examples:

// Module M with library evolution
public enum E { case a; case b }

// Module N
import M

func g(_ e: E) {
  switch (e, e) {
  case (.b, .a): ()
  // note: add missing case '(.a, .a)'
  case (@unknown _, .a): ()
  default: ()
  }
  switch (e, e) {
  case (.b, _): ()
  // note: add missing case '(.a, .a)'
  case (@unknown _, .a): ()
  default: ()
  }
  switch (e, e) {
  // note: add missing case '(.a, _)'
  // note: add missing case '(.b, _)'
  case (@unknown _, _): ()
  }
}

The semantics mimic what you could write by hand today: replacing @unknown _ with a binding pattern let x and using a nested switch x { case ... ; @unknown default }. Please feel free to ask questions if it's not clear why these examples work the way they do, or if you find the behavior to be unintuitive.

Other contexts

@unknown _ is not permitted in pattern contexts other than the case items of a switch statement, such as for-each loops, if-let statements, guard-let statements and on the LHS of an assignment statement.

Alternatives

If @unknown _ is too verbose or not sufficiently clear, we could use a different syntax.

Prototype

A prototype for this feature is available here; you can see some code "in action" in a test case. At the time of writing, the core functionality is implemented; the pattern decomposition works and fix-its are provided for missing permutations. However, there is some polish work that needs to be done (testing for and improving parse errors, adding more pattern-matching tests and enforcing restrictions around usage in non-switch contexts). I'm looking for feedback before proceeding.

(Also, in case you'd like to chip in with implementation/testing but don't have much compiler contribution experience, this may be a good opportunity to pick off small tasks. Feel free to send me a forum message if you'd like to help.)

xwu · April 7, 2021, 5:02pm

Is it intentional not to permit if case @unknown _ = ..., and other uses following case but not in a switch statement? It seems overly restrictive to do so, and breaks with the existing design of Swift where generally a case is a case.

I understand not supporting @unknown _ in other contexts, but I would argue that case @unknown _, wherever it appears, is not "another" context than case @unknown _ inside a switch statement.

typesanitizer · April 7, 2021, 5:20pm

This isn't strictly true when you consider @unknown. Here's the behavior of code today:

import SwiftUI
func f(_ s: RoundedCornerStyle) { // non-frozen enum
  switch s {
  // add missing case: '.circular'
  // add missing case: '.continuous'
  @unknown case let _: ()
  }
  if @unknown case let _ = Optional.some(s) {} // error
}

Since @unknown is only allowed in switch today, so it makes sense that @unknown _ would also only be allowed in switch.

However, that's not really the main reason for disallowing it. The reason I didn't allow it was because it's unclear what what fix-its should be provided if it's written. Can you suggest what use you have in mind for if case @unknown _? I can think of one potential way to fix it:

if case @unknown _ = expr { body }
// [fix]->
let val = expr // avoid evaluating expr multiple times
if case pat1 = val { <stub> }
else if let case pat2 = val { <stub> }
...
else if let case @unknown _ = val { body }

But it's not clear if this is valuable, it seems strictly worse than writing a switch statement.

Similarly, we allow things like for case _ in expr today. It's not clear what for case @unknown _ in expr would mean and what fix-its should be provided for it.

xwu · April 7, 2021, 7:15pm

Mmm, @unknown is only allowed in switch today because default is only allowed in switch and @unknown is only allowed in front of default today. The same does not apply to case and _, so it would not be logical to maintain the restriction just because.

A similar use case as that you illustrate for switching over (e, e), but if case and similar allow pattern matching without exhaustiveness.

A major point of allowing case (@unknown _, @unknown _) in your example over just @unknown default is to allow the mixing and matching of unknown and known cases for the first and second tuple elements.

In a similar scenario, I may wish to switch over the first element of the tuple exhaustively, while I may not wish to for the second. It does not seem necessary to me that the user must be forbidden from annotating that the default case matching the first element is meant only for unknown resilient cases:

func g(_ e1: E, _ e2: E) {
  if case (.a, _) = (e1, e2) {
    // ...
  } else if case (.b, _) = (e1, e2) {
    // ...
  } else if case (@unknown _, .b) = (e1, e2) {
    // ...
    // I want the compiler to tell me if my assumption is wrong
    // and I missed a known case for `e1`, but I *don't* care about
    // having switched exhaustively over `(e1, e2)`.
  }
}

It seems (perhaps too simplistically) that the obvious fix-it, if it's written in a context where a known case is missing, would be to remove the @unknown:

if case @unknown _ = e1 { ... }
// fixit:
if case _ = e1 { ... }

We don’t use syntax rules to enforce things like this: for example, Swift will warn you if you try to write code in a function after a non-conditional return statement, but it is still syntatically well formed to write such code.

There is the trivial case of a resilient non-frozen enum with no cases, for which for case @unknown _ in expr would execute only if there are new cases added later: it may not be easily testable, but it’s actually code that could be executed, and it means something more than just what for case _ in expr would mean if an enum had no cases.

But one need not actually argue for a use case for such a trivial scenario: it’s fine to let features compose as they are and then warn users when the composition doesn’t make sense, but that’s different from trying to enforce “meaningful” code with ad-hoc syntax rules.

typesanitizer · April 7, 2021, 9:21pm

As I demonstrated earlier, @unknown also applies to case today (but only inside a switch), not just default:.

switch s {
  // add missing case: '.circular'
  // add missing case: '.continuous'
  @unknown case _: ()
}

The compiler does not reason about if-else chains today the way it does with switch statements. Here's an example: (compiler explorer)

switch Optional.some(0) {
  case .some(_): ()
  case .some(_): () // warning: redundant
  default: ()
}
let e = Optional.some(0)
if case .some(_) = e { }
else if case .some(_) = e { } // no warning
else { }

When you say

// I want the compiler to tell me if my assumption is wrong
// and I missed a known case for e1, but I don't care about
// having switched exhaustively over (e1, e2).

What that requires to happen under the hood is that we would need to rework how we handle if-else chains: for every if-else chain, if the chain has an @unknown _, first try to detect whether the chain is "switch-like", and if so, do the same analysis that we do for switch statements.

I suspect that the amount of implementation work for doing this alone would likely be larger than the entire implementation that we have so far.

The question comes down to: what is the utility and is the implementation effort worth it? For me, the code pattern you've demonstrated has dubious utility, it would be clearer if it were written as a switch directly instead which the compiler can very well understand today.

I understand the desire for equivalent code to provide equivalent warnings, but that's not the case today with the compiler, so I don't see why new features should be held to that standard.

One could equally well argue the the "obvious" fix-it is to create an if-else chain exploding the different cases, analogous to the fix-its provided for a switch statement...

If the main point you're trying to make is that @unknown should be allowed in more places in the grammar, and in places it doesn't make sense semantically, we can warn and treat it as _, instead of making it a parsing error... Well, we could do that. I'm not strongly opposed to it. The main reasons for me not choosing that route are:

It likely requires more checks in the compiler, meaning that it is more likely that we miss to emit a warning in a particular situation.
It's not clear what benefit it has in practice apart from having a more uniform grammar.

But it's not a big deal either ways.

xwu · April 8, 2021, 1:52pm

typesanitizer:

What that requires to happen under the hood is that we would need to rework how we handle if-else chains: for every if-else chain, if the chain has an @unknown _ , first try to detect whether the chain is "switch-like", and if so, do the same analysis that we do for switch statements.

I suspect that the amount of implementation work for doing this alone would likely be larger than the entire implementation that we have so far.

The question comes down to: what is the utility and is the implementation effort worth it? For me, the code pattern you've demonstrated has dubious utility, it would be clearer if it were written as a switch directly instead which the compiler can very well understand today.

I understand the desire for equivalent code to provide equivalent warnings, but that's not the case today with the compiler, so I don't see why new features should be held to that standard.

See below. I'm aware of the implementation concerns, and I would not be bothered if an implementation of this proposal (or any subsequent work) doesn't implement exhaustiveness checking merely throws up a warning to say that it cannot perform the check at this time, but leaking such implementation concerns into the grammar is punishing the user for something that's in the hands of the designer and implementer.

Yes, I think you catch my meaning accurately.

It is a simple, explainable rule to say that @unknown can decorate _ wherever you intend to match only currently unknown cases. Discoverability is important for a language to be friendly to learners, and in this case I mean specifically that users should be rewarded and not punished when they try to generalize a principle or concept they understand correctly to new scenarios. Where it doesn't make sense to apply a concept, it's natural to warn them that they're doing something not useful: this further develops their understanding of a concept and why they would want to use it.

Where it does make sense (for example, my example with non-exhaustive if case...else if case) and we simply haven't implemented it, it's reasonable to warn the user that they're right about how they're thinking about the language but we haven't (currently, or "yet") implemented that use case. Users can be well satisfied that they've understood and used the concept correctly, and the shortcoming is ours, not theirs.

Throwing up a syntax error in that scenario achieves the opposite: it faults the user by telling them that they're "holding it wrong" when the fault does not lie with them or their understanding. That is a big deal.

porglezomp · April 8, 2021, 10:08pm

I think this can still make sense as a syntax error if it's a specially cased syntax error—if the syntax error tells you that @unknown is only permitted in patterns inside a switch because that's the only place where the compiler performs exhaustivity analysis, that's still extending that language knowledge. Not handling it specifically and making it a generic "unexpected @unknown" does seem like a potential stumbling block for learners.

typesanitizer · April 10, 2021, 3:53am

I've been thinking of your comment for the past 2 days, and I'm actually convinced of the opposite by it, namely that we should not support @unknown _ outside switches (+1 to Cassie's suggestion of special-casing the diagnostic though, I had that in mind but I hadn't gotten around to implementing it) if we do not support exhaustivity checking for if-else chains.

As I already stated before:

Today the language doesn't support @unknown outside switch.
Today the compiler doesn't do any pattern-checking for if-else chains, only for switch.

Moreover, I don't know of any precedent of any other mainstream languages which support pattern-matching also supporting exhaustivity checking for if statements/expressions; there is usually a dedicated construct like match/case which people are expected to use if they want exhaustivity checking.

There is a good reason for this: it makes things predictable on when exhaustivity checking is (not) in place. Even ignoring the additional verbosity, supporting exhaustivity checking with if creates a situation where perturbing the code slightly means you risk losing out on exhaustivity checking silently. That would be poor UX.

These two reasons make it quite unlikely that someone expects @unknown _ to work outside of a switch. However, if they do expect that, their mental model of what the compiler does and does not support is inaccurate even before this pitch. So when you say

I disagree with your implicit underlying premise. If a user is expecting @unknown _ to give them warnings with if let, they're wrong about they're thinking about the language and compiler based on the state of affairs today. There is no good reason for why they should expect it work based on their prior experience with Swift or most other mainstream languages with pattern-matching. If they're thinking "it could theoretically work, hence it should work," that is wishful thinking, not reasoning.

I disagree with this perspective. I think that if a feature is not implemented, the compiler should not be leading people on by saying something like "cannot perform the check at this time." That's implicitly saying "well, this could work in the future" and I do not think committing ourselves to future language directions in diagnostic messages is appropriate. That's an invitation for more bug reports to be filed ("why doesn't this work yet") and if those bug reports go unattended, that creates resentment.

xwu · April 10, 2021, 2:22pm

That’s an interesting perspective: You’re arguing that the use of @unknown is unreasonable in the scenario described earlier (reproduced below) and an _anti-_goal to support. I don’t think it is, but if Swift wants to make that claim, then by all means it’s fine to make that a syntax rule. I think this should be explicitly argued and evaluated on the merits in the proposal though.

xwu:

func g(_ e1: E, _ e2: E) {
  if case (.a, _) = (e1, e2) {
    // ...
  } else if case (.b, _) = (e1, e2) {
    // ...
  } else if case (@unknown _, .b) = (e1, e2) {
    // ...
    // I want the compiler to tell me if my assumption is wrong
    // and I missed a known case for `e1`, but I *don't* care about
    // having switched exhaustively over `(e1, e2)`.
  }
}

This is not a perspective made up by me: the compiler has in the past (and still does, to my knowledge) make note of missing support in its user-facing diagnostics. I have not seen a proliferation of bug reports filed for situations where diagnostics have been clear that the limitation is a practical one of engineering time and effort, asking when it will be done yet.

Where the diagnostics do not make the nature of such limitations clear (e.g., limitations surrounding existential type self-conformance to protocols without Self or associated type requirements), users routinely write to these forums trying to understand why something isn’t supported in cases where the limitation is not inherent to the language design, asking if they have misunderstood something. This is not a desirable result, and we should not be pleased that it happens again and again.

In the Swift Evolution process, we have now adopted the phrase “resting place” to denote a state of affairs in terms of design and/or implementation that is not the fullest imaginable execution of an idea but a sizable quantum of improvement with a reasonable result for the time being. Since this notion has been acknowledged by the core team in these terms, it has been well accepted by the community and we regularly ask whether a proposal includes a reasonable resting place when the overall concept is not implementable in one go. We do not simply pretend that it is an active anti-goal to imagine a fuller design or try to discover after-the-fact design motivations to justify practical limitations of implementation.

masters3d · April 10, 2021, 6:00pm

Can you provide some references to this? We should probably have a guidance document with theses group held notions that can help guide design authors.

xwu · April 10, 2021, 6:30pm

I believe this term first emerged during discussion of SE-0286:

typesanitizer · April 13, 2021, 5:04am

I have already argued why this behavior makes sense collectively across my prior comments -- supporting exhaustivity checking for if-else (1) does not have precedent in Swift (2) does not have precedent in other languages and (3) primarily serves to provide an alternate (and arguably less clearer) spelling for switch (4) requires a bunch of implementation effort and (5) creates a fragile design where slightly perturbing the if-else chain means one silently loses out on exhaustivity checking. If these reasons collectively are not convincing, then we can agree to disagree, I don't have any additional arguments to make.

I've included a summary of my thoughts to the points you've raised on diagnostic phrasing.

My perspective has also come up based on discussion with senior compiler engineers, it's not something I came up with myself.
There is a difference between making note of missing support vs having phrases like "at this time". In my earlier comment, I said "I do not think committing ourselves to future language directions in diagnostic messages is appropriate." I am perfectly fine with making note of what is not supported. Phrases like "at this time" strengthen the implication that "well, this could work in the future." and I am not in favor of having such phrasing in diagnostics.

In almost every case of something not being supported, the "why something isn't supported" is not something that can fit into an inline diagnostic, since it's a complex question. These kinds of answers are well-suited to long-form documentation such as Swift evolution proposals and the new educational notes, as well as places like the Swift forums for discussion.

Adding phrases like "at this time" in inline diagnostics or making limitations clear do not help with the "why something isn't supported." In fact, done poorly, it can lead people to think that something is only a matter of implementation even though that's not actually the case. Here's an example:

typealias C = some Collection
// error: 'some' types are only implemented for the declared type of properties and subscripts and the return type of function

This only moves the question from "why doesn't this work" to "why hasn't this been implemented?" it doesn't answer the question of why it hasn't been implemented. Moreover, this diagnostic is misleading because it makes the person think that it is only a matter of implementation which is not actually the case; there is a complex design question of how it ought to work in practice, especially when where clauses are involved.

That said, I would prefer to not derail this thread discussing the nitty-gritties of how diagnostics should (not) be worded. We can discuss that in a separate thread.