[Re-Proposal] Type only Unions

miku1958 · June 26, 2024, 4:20am

I've briefly drafted the definition of Common part, and if there are no problems in the next few days I'll update it in the post:

*Updated based on Nickolas's idea

What the compiler needs to do is:

when the parameter/variable type is union type, the compiler checks that the type of the argument is a true subset of the union type

if not then report an error
if yes then compile success

protocol P {}
protocol Q {}

struct S: P, Q {}
struct T: Q {}

func test(_ value: any P | any Q) { }
test(S()) // ok
test(T()) // ok

protocol O {}
protocol P: O {}
protocol Q: P {}

struct S: O {}
struct T: P {}
struct U: Q {}

func test(_ value: any O | any P) { }
test(S()) // ok
test(T()) // ok
test(U()) // ok

class A { }
class B<T> { }
class C: B<Int> { }
class D: B<String> { }

var value: A | B<Int> = A() // ok
view = B<Int>() // ok
view = C() // ok
view = D() // fail

protocol P1 { associatedtype T }
protocol P2: P1 where T == String { }

class A: P1 { typealias T = Int }
class B: P1 { typealias T = Int }
class C: P2 { }
class D: P1 { typealias T = String }

var value: A | P2 = A() // ok
view = B() // fail
view = C() // ok
view = D() // fail

when the union type calls a method, compiler checks if the method signature exists in all types

if not then report an error
if they all exist then check if the return type is the same
1. if it is then return the same type
2. if it is not then return a new union type

struct S {
    var foo: () -> Int
    var bar: String = “”
}

struct R {
    func foo(_ x: String = “”) -> Int { 42 }
    func foo() -> Double { 3.0 }
    var bar: [Int] = []
}

let x: S | R = R()
let y = x.foo() // y has type Int
let z = x.bar // z has type (String | [Int])

protocol P {
  associatedtype Assoc
  func getAssoc() -> Assoc
}

struct A<T> {
  public let wrapped: T
  func getAssoc() -> T { wrapped }
}

struct B {
  func getAssoc() -> Int { 42 }
}

func f<T>(x: A<T> | B) {
  let z = x.getAssoc() // z has type (T | Int)
}

let view: NSTableView | NSCollectionView

view.isOpaque = false // setter (isOpaque: Bool) can be found in both NSTableView and NSCollectionView

when the union type value is used as parameter / error，compiler checks if all types can be used for this parameter,

if not then report an error
if yes then compile success

enum ErrorFoo: Error { }
enum ErrorBar: Error { }

let error: ErrorFoo | ErrorBar

throw error // throw need a Swift.Error, and ErrorFoo and ErrorBar both meet the requirements

protocol P1 { associatedtype T }
protocol P2: P1 where T == String { }

class A: P1 { typealias T = Int }
class B: P1 { typealias T = Int }
class C: P2 { }
class D: P1 { typealias T = String }

func test1(_ value: any P1) { }

let value1: A | B
test1(value1) // ok

let value2: A | C
test1(value1) // ok

func test2<V: P1>(_ value: V) where V.T == String { }
test2(value1) // fail
test2(value2) // fail

let value3: C | D
test2(value3) // ok

when switching a union, its behavior should be similar to switch an Any value:

let value: A | B
switch value {
    case let value as A:
        ...
    case let value as B:
        ...
}

However, to simplify implementation, compiler can compare the type of each case with the union types, and if there is consistency, the type is assumed to be covered. To accomplish this, the compiler needs to support the following:

considering protocol/class inheritance, switches should allow upward or downward casting.
to reduce complexity, overlap can be supported

check if all union types have been switched, it is recommended to check type and its downward only if all types have been included, if not, it should report the error.

protocol A { }
protocol B { }
protocol C: A { }

let value: A | B
// ✅ exhaustive
switch value {
    case let value as A:
        ...
    case let value as B:
        ...
}

// ✅ downward casting
switch value {
    case let value as B:
        ...
    case let value as C:
        ...
}
// ✅ downward casting
switch value {
    case let value as B:
        ...
    case let value as Any:
        ...
}

let value: B | C

// ✅ exhaustive
switch value {
    case let value as C:
        ...
    case let value as B:
        ...
}

// ✅ overlap
switch value {
    case let value as A:
        ...
    case let value as B:
        ...
    case let value as C:
        ... // This case will never be triggered, it would be nice to give a warning.
}

// ❌ error: Switch must be exhaustive
switch value {
    case let value as A:
        ...
    case let value as B:
        ...
}

// ✅ exhaustive
switch value {
    case let value as A:
        ...
    case let value as B:
        ...
    default:
        ...

Nickolas_Pohilets · June 26, 2024, 6:37am

IMO, that’s too restrictive while being overcomplicated. You don’t really need to compute the common type to access members of the type union. It is safe to access members which are not part of the common type, but still are common for all types.

Given x: T1 | T2 | … Tn it should be possible to access x.v if v exists in each of the Ti. Type of x.v is a union of types of v in each of the Ti.

struct S {
    var foo: () -> Int
    var bar: String = “”
}

struct R {
    func foo(_ x: String = “”) -> Int { 42 }
    func foo() -> Double { 3.0 }
    var bar: [Int] = []
}

let x: S | R = R()
// explicit type is needed to choose overload of R.foo()
let y: Int = x.foo() // ok
let z = x.bar // z has type (String | [Int])

miku1958 · June 26, 2024, 8:28am

So what the compiler needs to do is, when the developer calls a method of the union type, check if the method signature exists in all types, if not then report an error, if they all exist then check if the return type is the same, if it is then return the changed type, if it is not then return a new union type.

Also Editor need to list all the functions/properties of all types as candidates

miku1958 · June 26, 2024, 8:30am

But this doesn't solve the problem of how to make the union type automatically implement a protocol/as a class

Nickolas_Pohilets · June 26, 2024, 10:52am

In general case, this problem has no correct solution, for the same reasons as why (any P) does not conform to P.

There are few hacky self-conforming protocols, including Error. We might continue to support hacks for Error, and say that A1 | A2 | ... | An conforms to Error is every A_i conforms to Error.

But in general case, we should not be trying to solve this "problem". Instead,

A. Even without union type conforming to protocol, you can still:

access common members (even without common protocol or base class)
upcast to common existential type or base class
open unions like existentials (similar to SE-0352, or, if we ever get explicit opening, it should work for unions too).

B. If you need to express constraint over generic type, which can be concrete type, existential or union, then you need Existential subtyping as generic constraint.

miku1958 · June 26, 2024, 3:05pm

I then thought that the problem might be simpler than I thought, and that the compiler just needs to check that all the types match when it needs to, so I re-edited my original reply. [Re-Proposal] Type only Unions - #43 by miku1958

kavon · June 26, 2024, 8:45pm

This feels like a step backwards in language design, to an era before protocols and traits: the bad old days of C++ template parameters, where you had to read a function’s implementation to understand what methods are required, such as to add a type to an existing union / pass that type into the function.

kavon · June 26, 2024, 9:15pm

This is what protocol conformance checking does.

This is what defining a protocol does: it lists all of the common members of types that conform to it. By defining a protocol in your source code, your editor doesn't have to conjure up an ad-hoc list of members shared between the types, that was inferred based on the current definition of your function. You can precisely define what members a type needs using a protocol.

wadetregaskis · June 26, 2024, 10:42pm

Perhaps it's worth noting that there's value even in a very simplistic version of this, which side-steps most of this discussion about more complex designs & functionality.

The simplest useful form is simply putting a bound on the possible types. e.g.:

let foo: Data | NSImage = someFunction()

switch foo {
    case is Data:
        …
    case is NSImage:
        …
}

It's not essential that it doesn't use existentials. It's not essential that you be able to access any members without explicit casting to a specific type. The key functionality is simply the exhaustiveness checking.

The compiler does need to understand the type constraints that flow from case, but this has precedence in [FullTypedThrows] catch (and in a sense, with if let x { … }).

The only way (to my knowledge) to achieve this today is to create a named enum manually, e.g.:

enum DataOrImage {
    case data(Data)
    case image(NSImage)
}

let foo: DataOrImage = someFunction()

switch foo {
    case .data(let data):
        …
    case .image(let image):
        …
}

Notice how much more [unnecessary] ceremony that is. And how we had to name multiple things which could otherwise have been anonymous (DataOrImage) or shared the same existing name (data & image, vs foo). And naming is hard, so this can waste more time than you might assume (and introduce bugs due to unintended shadowing etc).

Starting with such a simple but still useful feature doesn't preclude later adding extra smarts (e.g. the ability to access members common to all the types).

And I suspect the edge cases will be handled just fine using existing diagnostics. e.g. if you somehow end up with Data | Data, such as through typealiases which obscure the redundancy, that's technically fine - it just devolves concretely to Data, and if you try to switch on it the compiler will warn you if you have duplicate cases. (this is notably simpler than using tuples / variadic generics / explicit enums, where you'd have confusing duplicates distinguished arbitrarily by index or case names)

miku1958 · June 26, 2024, 10:59pm

I think this is a problem for the editor rather than the developer, ideally the editor would generate a common type for the union for the developer to look at the API, which is what I wanted to do at first, but I realized that this is not necessary for the compiler, which only needs to check that all the types match when it needs to.

ibex10 · June 27, 2024, 12:13am

The vi editor, which I still use, does not help here.

ksluder · June 27, 2024, 7:00am

Exhaustiveness checking is not trivial. Is this exhaustive?

func f(arg: Int | String) {
  switch arg {
  case let _ as Any:
    print("well, what is it?")
  }
}

Binary stability also comes into play. Is it source- or ABI-compatible to remove a type from the union that a function returns? What about adding to the type of an argument?

// can this:
public func f() -> A | B | C { }
// become this?
public func f() -> A | B { }

// what about this:
public func g(arg: A | B) { }
// becoming this?
public func g(arg: A | B | C) { }

What about subclassing?

open class Parent {
  func m(arg: A | B) -> D | E | F { }
}

class Child : Parent {
  // is this ok?
  override func m(arg: A | B | C) -> D | E { }
}

How does this work in a generic context? What does codegen for this switch look like?

func f<A, B>(arg: A | B) {
  switch arg {
    case let _ as A:
      print("It’s a \(A.self)!")
    case let _ as B:
      print("It’s a \(B.self)!")
  }
}

Retroactive conformances could create unpredictable source breaks:

struct S { }
struct T { }
struct U { }

protocol P { }
extension S : P { }
extension T : P { }

func f(arg: S | T | U) {
  switch arg {
  case let _ as P:
    print("got a P!")
  case let _ as U:
    // this case becomes redundant if U gains a conformance to P
  }
}

ibex10 · June 27, 2024, 7:35am

I have been following this topic with great interest, but I don't find this proposal really essential to have. It can already be done with the existing features, without adding more complexity to the compiler.

Do we really want a Swift army knife?

Sorry, if I hurt anybody's feelings.

wadetregaskis · June 27, 2024, 2:14pm

Yes.

Although even if the compiler doesn't recognise that - requiring instead explicit cases for each type in the union - this feature is still useful. That would simply be an obvious bit of extra smarts to add later.

Remember that many of the motivating use-cases are for relatively straight-forward things like unions of error types or otherwise disjoint types (e.g. Data | NSImage).

Is it not broadly the same, in this regard, as enums? Perhaps you could even use the @frozen attribute on typealiases of these unions.

In principle removing types from a union is binary-compatible. Adding them is not. Neither is strictly source-compatible, of course, since either may cause at least warnings (unreachable code or non-exhaustive switch).

I realise changes in the ancestors (protocol conformances, changes to class hierarchies, etc) are unique here, compared to enums. But I don't see anything that's a particular concern, irrespective of whether it is or isn't ABI-compatible. Nothing says this feature has to be binary-resilient to these definitial changes.

In principle, yes. But, again, even if this isn't supported initially, it doesn't make the feature useless by any means.

I would expect the same rules to apply as we already have regarding parameter generalisations or return value specialisations. (well, that said, I vaguely recall Swift has some weirdness here in some cases, so maybe this particular part of the language could not reproduce those bits? )

ksluder:

How does this work in a generic context? What does codegen for this switch look like?
func f<A, B>(arg: A | B) {
  switch arg {
    case let _ as A:
      print("It’s a \(A.self)!")
    case let _ as B:
      print("It’s a \(B.self)!")
  }
}

It's a good question; I'm not certain. In some cases I'm sure it'd be nice if this behaved like a constrained generic, where the compiler essentially specialises it into two versions of the function. But that might be quite non-trivial to implement (and perhaps better handled within the actual, explicit generics syntax we already have).

Even if it's simply an existential, and the switch is just expanded out to if…else if…etc, that's fine. It may preclude some use-cases (e.g. performance-sensitive stuff) but there's still plenty of use-cases that don't mind.

Embedded Swift would of course want it to not require existentials.

Consider also the conceptually equivalent:

func f<X: A | B>(arg: X) {
  switch arg {
    case let _ as A:
      print("It’s a \(A.self)!")
    case let _ as B:
      print("It’s a \(B.self)!")
  }
}

I don't know if there's reason to think that's different, or to disallow one more or the other of these syntax examples, but as far as I can see they're equivalent. And thinking of it that way makes it more apparent that this should be specialisable, just fine.

Right, this is part of the binary- and source-compatibility question you posed earlier. IMO it's fine: this is essentially nothing new to Swift, and it can follow the existing conventions just fine. e.g. adding a conformance that makes a case unreachable is binary-compatible but would (ideally) prompt a warning from the compiler, when re-compiling the switch statement.

If the underlying implementation is specialised generics, I would think it's still binary-compatible - it's just that now one of your specialisations is unreachable.

Keep in mind that switch is order-dependant. You can already have overlapping cases, and the compiler just selects the first one that matches (and, generally, warns when you have a subsequent case that is unreachable because it's completely covered by an earlier case).

wadetregaskis · June 27, 2024, 2:24pm

Note also that a possible approach is to treat this like "partially-opaque" types. In the sense that, like some is opaque to the reader but the actual type is always known to the compiler, A | B could be the same. It just provides a more precise and flexible way to express the possibilities (than having to manually create a special marker protocol just to represent the union).

However, I prefer A | B actually be conceptually an existential (albeit one ideally optimised away to specialised versions where possible). Because for return values, in particular, one of these "partially-opaque types" is not nearly as useful as an existential. All the use cases I personally have require runtime flexibility in choosing the type (e.g. throwing errors, functions that genuinely need to sometimes return one and sometimes the other, etc).

ksluder · June 27, 2024, 3:43pm

wadetregaskis:

ksluder:
Exhaustiveness checking is not trivial. Is this exhaustive?
func f(arg: Int | String) {
  switch arg {
  case let _ as Any:
    print("well, what is it?")
  }
}
Yes.

In order to implement this, the compiler would need to compute the entire set of subtypes for each case of the switch, verify no overlap between any of these sets for any switch cases, and verify that there are no types in the union which are not present in any of the switch cases’ sets.

That sounds expensive, and it’s also fragile. The compiler can only see conformances from modules that are in scope at the time of use. If I go back and add a new import that makes a new conformance visible, it can make some unrelated switch case redundant.

You seem to be envisioning building this feature via an accretion of special cases. You can sometimes get away with this approach on the fringes (see, for example, result builders), but when it comes to the core language—especially the type system—thinking systematically is critical to avoid accidentally setting traps for the future or turning the compiler into an unmaintainable mess.

Specifically, it sounds like this approach would require an entirely separate implementation of switch case just for switching over a value of union type. This separate implementation would be much more restrictive, only allowing direct equality type comparisons. Would this also require a new if let case implementation? What other shared codepaths through the compiler would need to be forked to support a syntactically-similar but wholly novel feature? And do we have any reason to believe they can be reunified in the future, or is it just hope?

wadetregaskis · June 27, 2024, 6:12pm

It doesn't care if there's overlap. Switch cases are allowed to overlap, already.

Otherwise, yes.

Yeah, but, is that a big deal? There's all sorts of reasons why protocol- or class-based casting can change at runtime, already.

I wouldn't characterise it that way. I'm suggesting building something simple, useful, and [probably] extensible. Then seeing if and where to ultimately extend it.

There is indeed value in trying to thinking ahead, particularly to search for pitfalls or pain points. But it has to be balanced. I feel like the 'burden of proof' being demanded here, by some folks, is way higher than normal. If it had been applied to all of Swift's proposals to date, I don't think many of them would have passed.

(consider that the entire generics system is still being fleshed out and the direction has changed pretty substantially more than once over its lifetime so far - which is not "ideal" in some sense but it realistically the only way it can proceed… or heck, just review the String API )

hishnash · June 28, 2024, 4:06am

I do rather like this idea.

Is this something that would be purely resolved at runtime or do you expect some compiler optimisations to be used when a concrete type is known by the compiler.

Eg a function might want to expose an interface of func evaluate(_ value: A | B)

struct A {
    func computeOutput() -> Int
}

struct B {
    func computeOutput() -> String
}


func evaluate(_ value: A | B) -> Int | String {
    
    switch value {
    case let a as A:
        ....  // do some pre compute setup on type A
    case let b as B:
        ...  // do some pre compute setup on type B
    }
    return value.computeOutput()
}

If the caller of this function provides a concrete type for the value (eg they call evaluate(A()) directly) would the compiler then optimise away the switch checks and fall through to just calling the related code for the type A as if the fiction were righten as evaluate(_ value: A). And in this case would it then resolve that the return type is thus constrained to be Int since that is the return value of computeOutput() for A.

let a = A()

let result = evaluate(a) 

// does the compile think result is `Int | String`
// does the compiler resolve the result to be `Int`

Furthermore if there are say 3 option for evaluate(_ value: A | B | C) and the caller calls it with a type that could either be A | B (but not C) is the return type correctly constrained to the results of computeOutput() of A and B or would it still include the return type of C().computeOutput() as well?

miku1958 · June 28, 2024, 4:36am

@wadetregaskis @ksluder
I've just updated the draft implementation, and point 4 should cover what you're discussing.

Also, regarding overrides, this is not a concern for union, Swift does not allow parameter types to be changed when overriding.

wadetregaskis · June 28, 2024, 4:51am

Not as written, because evaluate concretely returns Int | String, irrespective of its inputs. The optimiser might be free to benefit from the knowledge of the concrete input type and therefore the concrete output type, but only if it can see into the function (so same module or @inlineable).

To get that type deduction in all cases you'd need to do something with generics, e.g.:

protocol Resulty {
    associatedtype Result
}

struct A: Resulty {
    typealias Result = Int
    func computeOutput() -> Int
}

struct B: Resulty {
    typealias Result = String
    func computeOutput() -> String
}

struct C: Resulty {
    typealias Result = Bool
    func computeOutput() -> Bool
}

func evaluate<T: A | B | C>(_ value: T) -> T.Result {
    …
}

…but maybe that's a bit over-complicated. It's pre-supposing some smarts there regarding the deduction that T is Resulty because all its possible types are. Which is clever and might be useful in some [other] cases, but which in this case could perhaps be more easily achieved by just requiring Resulty to define the common method too (func computeOutput() -> Result).

I mean, I'm not opposed to that capability, I'm just interested in not adding anything more than is really necessary, to the language (and compiler).

The somewhat similar experience with exceptions suggests that the generics system is the right way to do this sort of thing, since it already exists, and is explicitly designed for this type of type constraining and propagation.