[Re-Proposal] Type only Unions

rvsrvs · July 3, 2024, 11:45am

This is actually the key factor for me in why I'd like structural sum types in the language. Using structural product types, I can unceremoniously write zip variadically on almost all of my generic types. But to write merge I have to engage in the ceremony of creating a separate nominal type for every arity I might use. I write a lot of boilerplate merge code as a result, whereas I have a single variadic zip for every generic type.

rvsrvs · July 3, 2024, 11:51am

For reference here, look at zip in swift-async-algorithms. Since 5.9, there is no real reason why this could not be written variadically (and I'm a little surprised that it has not been). However, merge will have to remain limited in its arities basically forever. I have lots of places where I'd like to not constrain merge that way.

miku1958 · July 3, 2024, 1:22pm

The closed protocol seems to me to be a perfectly fine manual implementation of union, whereas definitions and extensions should be automated by the compiler to save the developer time, and since current compilers are perfectly compatible with the closed protocol, there's no reason why I shouldn't believe that it's possible to use it to implement union. Just union requires more automation.

wadetregaskis · July 3, 2024, 2:08pm

Yeah, it gets you pretty much the same runtime functionality, where it works (more on that in a moment). It's quite verbose, though:

closed protocol Acceptable {}
extension Int: Acceptable {}
extension String: Acceptable {}

vs:

typealias Acceptable = Int | String

Also, would it work with protocols? e.g.

extension FixedWidthInteger: Acceptable {}

I suppose that lets you compose, e.g.:

closed protocol AcceptableOrError {}
extension Acceptable: MaybeAcceptable {}
extension Error: MaybeAcceptable {}

…but unless I'm mistaken, it doesn't work with generics? e.g.

closed protocol DoThingError {}
extension CancellationError: DoThingError {}
extension Failure: DoThingError {} // ❌ What is `Failure`?

func doThing<Result, Failure>(
        using closure: (Element) throws(Failure) -> Result)
        throws (DoThingError) {
    …
}

Pampel · July 3, 2024, 2:48pm

This is exactly why I mentioned primitive obsession; using String where it might be better to use EmailAddress, or Double instead of MonetaryAmount (I wouldn't use a Double for money for other reasons, but that's besides the point).

If I'm writing a finance application, I don't want someone passing a Double representing a monetary amount to a parameter expecting a Double interest-rate. DebitTransaction and CreditTransaction might both have a amount: Double and reference: String but they mean very different things.

Code that uses primitive types, tuples or anonymous union types (or any other form of structural typing) is susceptible to these concepts being confused, but giving these things a name both makes APIs/contracts more clear and helps the compiler prevent confusions.

For sure.

Also agree, in part, but this isn't necessarily entirely a bad thing, it's a trade-off between convenience and security.

I completely agree, but likewise it isn't necessarily a bad thing - I don't want the compiler to assume that anything that wraps a uint32_t has IPV4 address functionality, either for autocomplete suggestions or proving correctness.

Apologies if this is getting slightly off topic.

wadetregaskis · July 3, 2024, 3:55pm

Right, but my point was more that it's an example of where you in principle do need a canonical definition for something, because exactly as you say you can't just presume every UInt32 (or equivalent) is an IPv4 address, and yet we still don't have it. Nominal typing has demonstrably failed in this case. And it's far from the only case (I still occasionally encounter packages which define their own Result type, even after all these years of having it in the stdlib - and just look at how controversial it was to include even that!).

Nominal typing is great locally but it mostly doesn't work well otherwise. And even when it is somewhat successful, it can create concerning friction and "lock-in". It also imposes tedious ceremony even in some local cases. e.g. what do I care which specific function produced T | U if all I want is a T | U? Consider:

// Lib A
typealias Result = T | U
func preferredMethod() -> Result? { … }

// Lib B
typealias Result = T | U
func fallbackMethod() -> Result { … }

// Lib C
let result = preferredMethod() ?? fallbackMethod()

Which is not to say we should throw nominal typing out, by any means. I'm just saying, it cannot be the solution for all problems, and we cannot reject anything simply on the basis of whether or not it uses nominal typing.

Structural typing has its utility too, and I think it's reasonable for type unions to exist as part of that. Enums already exist for cases where a nominal typing approach is preferable.

As another angle, consider if callables were nominally typed. No more map taking a generic callable, now it takes a MapFunction which all your closures and functions must explicitly conform to if they want to be used with map.

I think that's a close analogy for union types' likely typical use, in function parameters and return / throw types. You don't care whether the parameter is an arbitrary named enum / struct / whatever, you just care that you can pass T or U (either of which could be MonetaryAmount instead of Double, if that matters for a specific given use case).

ksluder · July 3, 2024, 4:10pm

There is actually a different, long-desired language feature that directly addresses this use case: newtypes.

taylorswift · July 3, 2024, 5:01pm

in my opinion, this is just a symptom of a underdeveloped and fragmented library ecosystem, which is a common problem that afflicts many other languages besides Swift and not necessarily proof of the failure of nominal typing.

you can trace that back through multiple layers of root causes. the ecosystem is fragmented because Swift tooling tends to push people towards shipping very large library modules, which incentivizes the creation of parallel definitions for common currency types like IPv4Address, to avoid adding 10 MB of binary size. and the tooling only pushes people towards giant modules because of performance pitfalls that come with module optimization barriers and lack of developer confidence/education around performance annotations. and so on.

but we shouldn’t take that as evidence that nominal typing itself is fundamentally flawed. these libraries need to be refactored into smaller definitions-only modules, and they need to be given the correct performance annotations to make them as fast as the single-module layouts.

michelf · July 3, 2024, 9:29pm

Obviously not. FixedWidthInteger is a protocol and you can't add conformances to a protocol in a protocol extension.

Is this meant as some kind of replacement for Result<Acceptable, Error>? Why would you need that?

This won't work either for exactly the same reason: this is adding a conformance to a protocol in a protocol extension.

And even if this was allowed, I'd say adding an open protocol (Error) to a closed protocol is contradictory. The protocol can't be closed if you can add more types to it by conforming those types to Error, which would then make them part of AcceptableOrError.

To compose unions like that we would need a second feature. It's sort of a separate union thing that produces an anonymous protocol. This anonymous protocol can be closed but does not necessarily have to.

For instance, I can think of Encodable & Decodable as a structural protocol like this:

typealias Codable = Encodable & Decodable
// is equivalent in term of type relationship to:
protocol _StructuralProtocol1: Encodable, Decodable { /* empty */ }
extension Encodable: _StructuralProtocol1 where Self: Decodable {} // compiler trick
typealias Codable = _StructuralProtocol1

I think something similar could be done for Encodable | Decodable:

typealias X = Encodable | Decodable
// is equivalent in term of type relationship to:
protocol _StructuralProtocol2 { /* empty */ }
extension Encodable: _StructuralProtocol2 {} // compiler trick
extension Decodable: _StructuralProtocol2 {} // compiler trick
typealias X = _StructuralProtocol2

The protocol is not closed here: that's because Encodable and Decodable are open protocols and this prevents _StructuralProtocol2 from being closed... meaning you can't switch exhaustively and the memory layout will be a regular existential box.

To get a closed protocol you'd have to use concrete types or closed protocols:

typealias Y = Float | Acceptable
// is equivalent in term of type relationship to:
closed protocol _StructuralProtocol3 { /* empty */ }
extension Float: _StructuralProtocol3 {} 
extension Acceptable: _StructuralProtocol2 {} // compiler trick
typealias Y = _StructuralProtocol3

This demonstrates how this feature is not really tied to closed protocols, but if all the protocols and types in the union are closed, then the structural protocol can become closed (with enum-like memory layout and exhaustive switching).

And for fun, mixing both & and |:

typealias X = (Int8 | Int16) & FixedWidthInteger
// is equivalent in term of type relationship to:
closed protocol _StructuralProtocol4: FixedWidthInteger { /* empty */ }
extension Int8: _StructuralProtocol4 where Self: FixedWidthInteger {}
extension Int16: _StructuralProtocol4 where Self: FixedWidthInteger {}
typealias X = _StructuralProtocol4

Maybe there are issues I can't see, but it doesn't seem too far fetched to me that this could work.

Personally I'd start with closed protocols because this is where the memory layout and exhaustive switching would come from and there's not much point in type unions without exhaustive switching (might as well use Any). Then we can think of adding a structural union based on this.

wadetregaskis · July 4, 2024, 5:03am

Right, that's what I figured, but it's a significant limitation of that approach.

A bad example, perhaps. I just meant to demonstrate nesting - e.g. some module defines a closed protocol, and a second module wants to return either that or some other type(s). There needs to be a way to compose them, so that each layer of code can propagate through the type information of lower layers while adding its own.

In that case it's not a solution to most of my use cases, like combining multiple error types.

drkibitz · January 4, 2025, 3:25am

Where did this discussion go? Seems like it went in a direction about nominal typing versus structural. Overall, I think the case is clear that in Swift, there is a problem with the underlying concurrency mechanisms of Task cancellation and the new typed throws. I've been looking for discussions around this but haven't found any.

For me, I think if Swift supported type unions the answer would be simple, throws(E | CancellationErrror) -> T. Though without unions, the only other option I can think of is a standard protocol or an enum (i.e. something like TaskFailure<E>). Any thoughts on this?

nervenes · January 4, 2025, 1:38pm

i'm a novice so take my opinion with a grain of salt

having done some python I really enjoyed how simple it was to say this is either foo or bar using the foo | bar syntax, the other day i actually got curious as to why swift and rust both doesn't support this feature and i asked it on some discord channels, the answer basically was "we have enums, use them", that Option is essentially foo | none and Result being foo | bar with just an inappropriate name for this purpose, and that you basically just should create your own Either enum similar to Result.

while this all makes sense to me, especially in the rust world, i think swift can and should get away with introducing the python like | syntax for this. the reason i think this, is that swift already has a ton of small syntatic sugars and edge cases for all various things, be it common patterns or even niche things that you're unlikely to ever have to learn and use. so i don't see why it would be unfavored to add the| syntax, as it would be a syntatic sugar for wrapping two or more types in an Either enum, in my eyes this would just complement swifts syntatic sugar approach to things and make some things more simpler for developers coming from python f.e.

if i have any missunderstanding, or you'd like to add to my opinion, feel free to do so and mention me so i can get notified

AlexanderM · January 4, 2025, 11:23pm

@nervenes The tweet-length summary of the objection is:

Because it encourages bad patterns, especially around precisely-typed errors

And I would editorialize in my own opinion by adding:

...and there aren't many particularly compelling use-cases that aren't related to error handling

Tino · January 5, 2025, 2:53pm

I'd be really interested as well what exactly makes type unions a no-go.
Compiler complexity? Will it blow up the type system? Interference with some other features?
Imo there is no question that type unions are a powerful and useful feature, and I only need a single example to illustrate that:
Ceylon was more or less build around type unions (at least it feels so), so their implementation of Optional<T> was simply T|Nil.
Why do I consider that to be important? Well, Swift does add plenty of exceptions and special cases just to make its optionals behave like a type union:

Of course T can be used everywhere where T|Nil is expected
try? would have gotten the current signature naturally (and never return T?? — that simply does not exist with the union approach)
Issues due to storing things in an enum would be nonexistent

I think there are (or were) several small annoyances, but there has been definitely an limitation with optional closures due to the enum-implementation (Why optional closures in Swift are escaping · Jesse Squires).

I don't expect type unions to be a "cheap" addition, and there might be some odd edge cases depending on when (compile time vs. runtime) a type identify is actually revealed, but I really don't buy the argument that enums are somehow superior and thus there is no need for union types.

AlexanderM · January 5, 2025, 5:48pm

This isn't as expressive. Nested optionals are useful, and we wouldn't want to get rid of that. They were annoying with try?, but they're useful for e.g. the return value of d[key] when d: [Key: Value?].

If Optional were to be defined as a union, we'd want it to be more like Some(T) | None rather than T | None to make it nestable, similar to Haskell's Maybe.

Tino · January 5, 2025, 8:25pm

I strongly disagree with that example.
Where is the expressiveness in nil vs. some(nil)?
Especially when you could use types with actual meaning, like NoValueGiven and SetToNil instead?

AlexanderM · January 5, 2025, 11:18pm

Where is the expressiveness in nil vs. some(nil)?

In the dictionary case:

nil means "no value for that key"
Some(nil) means "there was a value for that key, and it was literally nil".

Nullable references like in Java, C#, Ruby, Python, etc. can't distinguish the two cases, so you usually need a separate dict.contains(key) lookup to differentiate between the cases. This comes up all the time in my day job in Ruby/Rails.

Imagine a user model that wants to distinguish between "this part of the form was never filled out" vs "the form was filled out, but the user has no value for it", e.g.

struct User {
  let colourPreference: Colour?
  let homePhone: PhoneNumber?
  let cellPhone: PhoneNumber?
}

It would be impossible to distinguish between "they did not state their colour preference" and "they stated that they have no colour preference".

Composable optionals solve this completely. Without them, the typical workaround for enums is to add a case like Colour.none to the Colour enum, which need I say... isn't a colour. Structs like PhoneNumber would be even more cumbersome to model.

Tino · January 6, 2025, 9:46pm

That answer does not fit to the question question at all — what I'm saying is that the meaning of .some(nil) is highly unintuitive, especially when compared with a return value of type ThereWasAValueForThatKeyAndItWasLiterallyNil instead (slightly too verbose, though ;-).
Nested optionals are painful, not an asset. I pity you if you are forced to do stupid stuff in your day job, but afaics Ruby completely solves your problem with new.

sspringer · January 6, 2025, 10:19pm

I do think it would be great to have this, as explained in some other comment I do not think enums are a “full” replacement. But I am not expert enough to say anything about efficiency here. The “philosophy” for Swift is — the way I understand it —, a feature should only be implemented in a way that is efficient i.e. can be optimized to be run by efficient machine code. That’s nothing a Python programmer might be worried about, but maybe this is the reason Rust also does not have it?

AlexanderM · January 6, 2025, 10:50pm

A fair critique, though IMO it's mainly addressed by the fact that it "just works", without surfacing the value to the dev explicitly.

E.g. this regardless of the level of optionality of the the V type. That's the point: the dictionary just adds one more layer, and trucks along without any difference.

func doSomethingWithADict<K, V>(dict: [K: V], key: K) {
    if let value = dict[key] {
        print("There was a value, and it was: \(value)")
    } else {
        print("There was no value for the key \(key)")
    }
}

Same for goes for dict[key] ?? someDefault.

This is basically the distinction between undefined and null in JS, and it doesn't solve the issue. You end up needing to take an arbitrary value, store it in a variable, and later determine if there was a value, or if it was never set. So you need something "outside" undefined, like undefined2. But now that's a valid value that might be returned or stored, so you need something outside that, so you need undefined3, .... This problem can recurse an arbitrary number of times.

but afaics Ruby completely solves your problem with Hash.new.

Partly, but that only works if there's a sensible default value to use, which is rare in the real world. 0, false, "" and nil are not good default values, almost ever. Colour.none is not a colour.

You could workaround this by leveraging Ruby's duck-typing to make yourself a bespoke sentinel value (NOT_SET = Object.new) that you can check against by object ID, but that is its own can of worms. And that trick doesn't work with static typing.