Structural Sum Types (used to be Anonymous Union Types)

Thank you so much for weighing in! What you said is pretty much exactly what I was going for: language design encouraging lazy error handling, just like you said.

Sorry for the sarcasm, @Douglas_Gregor. I got frustrated by being misunderstood like this and dismissed as an "angry troll" while trying to solve a problem that I (and I'm sure, many others) have. I hope discussion arguments will be preferred to be taken rationally rather than emotionally, in order to keep these forums productive.

7 Likes

my two cents here is that i can't remember many instances where i promoted an enum union to a protocol and regretted it afterwards. "exhaustiveness checking" has a funny way of becoming irrelevant once you've figured out what the various enum cases actually have in common.

the pain point for me personally is that the “protocol unions” are rarely ever important enough to justify introducing a top-level protocol for them; enum unions on the other hand have the benefit of being nestable. so for purely lexical reasons, i find that i still have to stick with the awkward old enums.

i imagine being able to nest protocols in namespaces will go a long way towards mitigating this issue.

2 Likes

That does make sense, but it still doesn't solve the performance problem. We need an option that involves zero heap allocations. The exhaustiveness checking is not just for guaranteeing that you'll never have an unexpected type, but also contains enough information to determine the static size of the value. For the purpose of error handling, my use case is the necessity to gracefully handle all possible errors, where rethrowing them or ignoring them is not an option. Solving the exhaustiveness problem would also help solve the performance problem.

are static enums actually a performance win though? an enum’s stride is the maximum size of its payload element, if you have a single large metadata struct, that just adds padding to all the other cases.

existentials and generics are not great for small POD types like Point2 or whatever. but i've found that they are excellent for abstracting over more complex data, especially if you can constrain them to AnyObject.

That's true. But with static polymorphism, the performance is reliable and if your data structures are not intended to be big (e.g. statically-typed errors), then the padding wouldn't be too big. For embedded systems, you'd probably make heavy use of static storage in the executable, which would make the padding largely irrelevant.

With existentials you do get to save some memory by avoiding the padding, but you pay for it by dynamic allocation. In this context, it comes down to low time complexity versus low space complexity. Both options have to be there, because both options are critical in different cases.

For instance, in audio processing, memory is plentiful, but responsiveness is critical, so you'd easily be willing to pay for the speed with some potentially big padding.

On the other hand, for a bare-metal microprocessor code that is working with extremely limited resources, you may need to squeeze every bit out of the available memory, so you wouldn't be able to afford to use such polymorphism much anyway.

I don’t think that’s quite correct…? Both some P and Foo<P> provide polymorphism that is resolved at compile time.

only with respect to members of P, requirements of P will still dispatch to the conforming type’s witness at run-time, unless the compiler can specialize it to a known type.

Right, my point is that in many cases, the compiler can specialize it to a known type. For starters, AIUI (others please correct me if I’m wrong), some P always specializes to a statically known type; that’s what some means. And I assume that means calls to requirements of P on a value declared as some P do not require a witness table dispatch under normal circumstances.

And isn’t it true that when a function f has a type parameter P of protocol type, the Swift compiler can at its discretion emit a type-erased version of f that uses witness table dispatch and / or a specialized version of f for specific concrete types implementing P?

At the risk of getting lost in the details, I think (again AIUI) it is not correct to say that protocols in Swift only provide dynamic polymorphism. Or perhaps I misunderstood the OP.

1 Like

this is only true if all your code lives in a single module. in larger projects, failure to specialize is a common occurrence and a major pain point, and is something that takes a significant amount of planning and effort to prevent.

2 Likes

That’s new information to me. I’ve paraphrased some P as, “It’s a specific type that conforms to P, but don’t worry about knowing which one. You don’t need to know, but the compiler knows.” So that’s not really true? There are instances where the compiler doesn’t know what concrete type your opaque type is? Does that code still compile?

1 Like

some P as a parameter type is exactly the same as func foo<T: P>(_ value: T) with all of the cross module issues that implies. And while some P as a return type acts the way you describe, I'm not sure that it's completely transparent to the compiler accross module boundries, since the underlying type is allowed to change in later versions of the code.

e.g.

// version 1.0
func foo() -> some P { return A() }
// version 1.1
func foo() -> some P { return B() }

is allowed.

1 Like

I apologize as well for overreacting; I'm going to remove my reply, and let's get back to language design.

Doug

8 Likes

Sounds amazing, a very long-awaited feature!

Also, I think it's worth looking at Scala union types Union Types | Scala 3 — Book | Scala Documentation

2 Likes

For this particular case you could use buildExpression

Does not really help, because to my understanding you then would have to convert all possible input to a common type, so I would have to convert all single items to sequences of items (e.g. by letting them conform to Sequence, but then still an iterator has to be initialized for each item). You can do that, but besides efficiency that feels like programming around a deficiency of the language which is exactly the “missing” sum types.

(It reminds me of using optionals as empty sequences to allow for optional chains in for-in loops which I really “need” in my XML processing — if I do not want to use forEach which is commonly seen as bad habit and also has some real disadvantages. It is those edge cases where I feel Swift could be more “streamlined”: Swift is such a nice language, it is a pity that in those cases you have to resort to strange or discouraged patterns.)

1 Like

I wish both anonymous sum types as well as untagged "pure" union types welcome in Swift!

I think the former will fill a nice gap in the type system where we're currently missing a light-weight anonymous enum syntax for one-off use, similar to how tuples are light-weight unnamed structs.

I think the proposed syntax (String | Int) fits nicely with the equivalent (String, Int) tuple syntax, and I think they both work with and without labels. We could reuse most of enum pattern matching syntax like suggested, with unnamed cases getting tagged like unnamed tuple elements .0(let string) etc.

I also think it would be useful with proper untagged union types, what uses unordered set unions rather than nesting for composition. It would be useful for Error propagation, but also for a new Optional-like type (e.g. Maybe) where Maybe<String> | Maybe<String> would be structurally equivalent to Maybe<String>, that is a type whose values are either .some(String) or .none, in sharp contrast to today's Optional, which would have to be composed into a nested double-optional where the values would be .none, .some(.none), or .some(.some(String)).

I can also envision proper untagged Either unions, combined with Maybe, so that Maybe<String> | Maybe<String> | Maybe<Int> is just flattened to Maybe<String> | Maybe<Int>. That is, actually just either a string or an int.

When designing this features, it is also needed to consider what happens when types being summed up can overlap.

protocol P {}
protocol Q {}
struct R: P {}
struct S: Q: {}
struct T: P, Q {}

let x: (any P | any Q) = T()
switch x {
case let p as any P:
    print("It's a P: \(p)")
case let q as any Q:
    print("It's a Q: \(q)")
}

IMO, the switch above should desugar into the following:

if let p = x as any P {
    print("It's a P: \(p)")
} else if let q = x as any Q {
   print("It's a Q: \(q)")
} else {
    fatalError()
}

So if type cases can overlap, the first matching one is executed. Order of case's in switch matters.

Now, in terms of memory layout, I think layout of Any is a good starting point - i.e.value buffer + type. As an optimization, instead of using value buffer of hardcoded size, we can compute it precisely based on types being used. E.g. value buffer for (Int | String) is max(sizeof(Int), sizeof(String)). If any of the types in question in an existential, then only value buffer part of the existential is taken into account.

I'm not sure though what to do with protocol witnesses. If we are sure we will not need to support private/multiple conformances, then it is ok not to store them and lookup dynamically when matching. But if we assume that we will need to support private/multiple conformances, then protocol witnesses need to be stored. And probably all of them. So that layout of the (any P | any Q) would be: value buffer + type + witness for P + witness for Q. And when assigning T() to x both witnesses will be stored. Then with private witnesses, it would behave like this:

let x: (any P | any Q) = T()
switch x {
case let p as any P:
    print("It's a P: \(p)") // prints "T()"
    print("P as Q: \(p as? Q)") // prints nil - when casting to any P witness for Q was lost
    print("x as Q: \(x as? Q)") // prints .some(T()) - `x` has witness for Q
case let q as any Q:
    ...
}
1 Like

Yep, always has.

Furthermore this situation already arises, e.g.:

class A {}
class B: A {}
class C: B {}

switch C() {
case is A:
    print("A")
case is B:
    print("B")
default:
    print("Other")
}

(outputs "A")

The compiler already warns that "Case is already handled by previous patterns; consider removing it". It would presumably continue to do so for sum types.

2 Likes

I've been meaning to get into contributing to the Swift compiler repository for a while now. Once I get myself familiar with the contributor's documentation, codebase architecture, and the build process, I'll start working on implementing this for the formal proposal. In the meantime, I'll try putting together a formal pitch once I have some time to spare.

5 Likes