Structural Sum Types (used to be Anonymous Union Types)

No one disputes these facts. But it is about being able to formulate nice APIs, and e.g. in my use case I mentioned I have to resort to protocols where actually I need sum types (aka enums) to keep the API nice, although they are not a perfect solution. So there is obviously a “gap” and the question is what to do about it.

That’s exactly what I meant with my original question:

That line of thinking was rebuffed as providing the “wrong kind” of polymorphism:

So I have to wonder if we collectively have the same goals here?

Can we enumerate the need? What problem are we trying to solve with anonymous sum types?

This thread arised from the thread about typed throws. So I suppose one of the needs is to coalesce types and exhaustively switch over them. To reformulate without mixing throw and catch in the equation:

let result: Result<String, (ErrorA | ErrorB)> = performReturningResult()
switch result {
case .success(let value): ...
case .failure(let error as ErrorA): ...
case .failure(let error as ErrorB): ...
}

But one of the needs is also to have an efficient storage for those errors and skip existentials. On this front, protocols fails.

But I think a sealed protocol (with a fixed list of conformances) provides the correct language model for this feature, while an enum provides the correct storage model. Can't we make a type with the semantics of a protocol and the storage of an enum?

Let's try.

We can already do that (sort of) with a property wrapper. If we desugar again this example:

let x: (Int | String) = 4
switch x {
case let int as Int: break
case let str as String: break
}

But unlike earlier, we augment it with a property wrapper for the storage. It becomes this:

protocol _AnonymousIntStringUnion {}
extension Int: _AnonymousIntStringUnion {}
extension String: _AnonymousIntStringUnion {}

@propertyWrapper
enum _AnonymousIntStringStorage {
	case a(Int)
	case b(String)

	init(wrappedValue: some _AnonymousIntStringUnion) {
		switch wrappedValue {
		case let a as Int: self = .a(a)
		case let b as String: self = .b(b)
		default: fatalError("unexpected type")
		}
	}
	var wrappedValue: any _AnonymousIntStringUnion {
		get {
			switch self {
			case .a(let a): a
			case .b(let b): b
			}
		}
		set {
			switch newValue {
			case let a as Int: self = .a(a)
			case let b as String: self = .b(b)
			default: fatalError("unexpected type")
			}
		}
	}
}

func test() {
	@_AnonymousIntStringStorage var x: any _AnonymousIntStringUnion = 4
	switch x {
	case let int as Int: break
	case let str as String: break
	default: fatalError("unreachable")
	}
}

There's some issues that needs compiler help for:

  1. The protocol is not sealed: you can add conformances to it.

  2. The property wrapper handling the storage is attached to the variable, but we need the compiler to do that job at the type level so (Int | String) itself is be stored like this everywhere.

  3. We need a guaranty the temporary conversion to an existential when getting and setting the value through the storage wrapper can be eliminated. It already works for init(wrappedValue:), but it doesn't for the wrappedValue property because we can't use some in the getter (switch would need to learn how to deal with our type, skipping the getter).

  4. we need a better syntax than this complicated boilerplate, obviously, and this syntax should offer a way to create such a type from variadic generics.

3 Likes

Use case 1

One of the problems I'm trying to solve is providing an ad-hoc structural type that can replace an ‘enum` for representing a union. After tuples gain ability to have extensions (and supposedly these anonymous structural types do as well), representing complex data structures will become not only possible, but (with some carefully considered syntax sugar) easy and intuitive. One big example for this is the use of result builders.

As an example: When tuples gain the ability to have extensions, SwiftUI.TupleView could be gone. With the types that are being discussed in this thread, SwiftUI._ConditionalContrnt could be gone as well.
Any similar structural all of / any of tree data structures will become easy.

Use case 2

With typed throws, the need to express one of several possible error types will arise pretty quickly. The only way that I can see this going smoothly is if there will be a way to express such types without defining an enum for it every time.

Use case N

I can't think of anything else for the moment, but I'm certain this is not it.

A type with protocol semantics, enum storage and exhaustiveness checking would be a dream come true! The workaround you provided perfectly demonstrates how this would look and behave, roughly. With some proper compiler support, this could be optimized into something that from the ABI perspective would be indistinguishable from an enum.

Here... I'll give it a name in hopes of it becoming reality some day: enum protocol.

1 Like

I do not find this use case very compelling. Either the errors are very limited and you should define an enum of possible error states or they aren’t and you should use any Error. I feel like it’s totally fair to expect an API to be purposefully defined and not ad hoc in nature.

1 Like

The language is ultimately not allowed to undermine the programmer by second-guessing their decisions. That's why despite the fact that swift is a type-safe and memory-safe language, it still offers UnsafePointer and unsafeBitCast.

In this case, any Error is unacceptable, because it relies on heap allocation, which is unacceptable for embedded systems and performance-critical code (like audio processing).

There are cases where exhaustive error checking is absolutely necessary and there is no way any unexpected error may occur, aside from the ones that are known.

A limited-error function may catch and re-throw errors from multiple other limited-error functions. In such a case, the only reasonable solution is to throw an anonymous sum type containing both error types. Otherwise, a dedicated single-use enum would need to be defined for every such function. This is the exact same reason why we have tuples and don't force the programmer to define a struct every time a function needs to return multiple values.

EDIT:

The ability to define limited-error functions, Swift's error handling mechanism can be used as a form of control flow, which isolates the happy path from the variety of unhappy paths, making the code a lot more ergonomic and maintainable, in light of many possible (yet perfectly predictable) outcomes of the operation.

Swift's error handling system is NOT built like typical exception handling (as in, errors can never unwind the stack or be accidentally ignored, if you don't count throwing main). This means that errors are a legitimate control flow mechanism that can and should be used to the fullest in order to express how an operation can end aside from the happy path. There's no good reason why this control flow should be somehow less type-safe than any other part of Swift. Type-safe means type-safe. Either the language is type-safe indiscriminately, or the language cannot be claimed to be type-safe.

1 Like

I feel like we may be straying back to the original thread’s purpose here, but I also don’t see a “type-unsafe Swift” argument here. Type-erased any Error existentials are still type-safe. They don’t let you put anything “in the box” that is a non-Error so I feel like you’re misusing “type-safe” here. (I am not a Computer Scientist, so perhaps there’s a strict definition for which your assertion on type safety is sound, but I’m approaching it from the layman’s perspective.)

Right. I think what @technogen means is static typing - deterministic types at compile time rather than at runtime. Or at least that's closer to the right terms.

To the point, it's not just about efficiency (existentials vs concrete types), it's about the coding experience (e.g. knowing what type(s) you're actually dealing with up front, exhaustive case checking, etc). Runtime type-safety doesn't provide those things (even though it's much better than no type safety, of course).

2 Likes

It errors at compile time if you try to put a non-error in an any Error var:

var err: any Error
err = "error"
// Error: Cannot assign value of type 'String' to type 'any Error'

I don't think it's important enough to warrant a special syntax like (Int | String), but I do think it would be worth adding something like this to the standard library, with a type named something like Either<each Option>. I have encountered plenty of situations where having a type like this would be useful, especially while using the new parameter packs feature.

3 Likes

Yes, which is better than nothing (e.g. Python or JavaScript), but still far from ideal in some cases.

See the origin thread for details, but in short if Swift only had "Any" - a bit like Objective-C by convention used id a lot - we'd all be pretty sad, or at least our users would be because of our much buggier apps. It's a bit weird that Swift goes to great lengths to have really good type inference and static typing, and actively discourages unnecessary use of existentials, for everything except exceptions.

2 Likes

That protocol has exactly zero user-facing requirements (technically, there are error domain and error code requirements for Darwin platforms, but they're hidden, so for all intents and purposes they don't exist).

Anything that can have a user-provided protocol conformance can be turned into an error with a single line of code and no forethought whatsoever.

For this reason, the error versus non-error distinction is practically nonexistent.
The Error protocol is essentially a marker protocol.

Exactly! To me, the term "type-safe" means (among other things) having information about what can safely be assumed to definitely be there, and what can safely be assumed to definitely NOT be there.

In this case, static type information in the form an exact set of error types fits the bill, while an absolutely arbitrary and completely un-actionable value wrapped in a marker protocol existential certainly does not.

If one simply has an instance of any Error, there's absolutely nothing they can do other than print it out and then cancel whatever they're doing. By definition, it's any error. Anything other than simple log-printing and triggering some sort of cancellation would require them to downcast it, which immediately invalidates the whole narrative about any Error being useful. The only reason why any Error is useful at all is to be able to execute an arbitrary throwing operation and rethrow its error without stopping to think why that error happened. At the very least, it's a lot better to do something like this instead:

enum MyError {
    case firstPartFailure(any Error)
    case secondPartFailure(any Error)
    // ...
}

They're still retaining the exact error that occurred without necessarily having to care about what the error was, but they're at least no longer mindlessly rethrowing an unknown error, because they can't be bothered to actually think about proper error propagation. But even then, I'd prefer this instead:

enum MyError<FirstPartFailure, SecondPartFailure> where
    FirstPartFailure: Error,
    SecondPartFailure: Error
{
    case firstPartFailure(FirstPartFailure)
    case secondPartFailure(SecondPartFailure)
    // ...
}

Because again, they have no business arbitrarily losing type information just because they're too lazy to actually think things through.

Exactly! One of the biggest wins in Swift, coming from Objective-C was static type information when it was needed. That is everywhere except for errors. I don't see any good reason why this would be an exception to the rule "have as much static type information as you want, but be able to lose it if you don't want it".

Thank you so much for weighing in! What you said is pretty much exactly what I was going for: language design encouraging lazy error handling, just like you said.

Sorry for the sarcasm, @Douglas_Gregor. I got frustrated by being misunderstood like this and dismissed as an "angry troll" while trying to solve a problem that I (and I'm sure, many others) have. I hope discussion arguments will be preferred to be taken rationally rather than emotionally, in order to keep these forums productive.

7 Likes

my two cents here is that i can't remember many instances where i promoted an enum union to a protocol and regretted it afterwards. "exhaustiveness checking" has a funny way of becoming irrelevant once you've figured out what the various enum cases actually have in common.

the pain point for me personally is that the “protocol unions” are rarely ever important enough to justify introducing a top-level protocol for them; enum unions on the other hand have the benefit of being nestable. so for purely lexical reasons, i find that i still have to stick with the awkward old enums.

i imagine being able to nest protocols in namespaces will go a long way towards mitigating this issue.

2 Likes

That does make sense, but it still doesn't solve the performance problem. We need an option that involves zero heap allocations. The exhaustiveness checking is not just for guaranteeing that you'll never have an unexpected type, but also contains enough information to determine the static size of the value. For the purpose of error handling, my use case is the necessity to gracefully handle all possible errors, where rethrowing them or ignoring them is not an option. Solving the exhaustiveness problem would also help solve the performance problem.

are static enums actually a performance win though? an enum’s stride is the maximum size of its payload element, if you have a single large metadata struct, that just adds padding to all the other cases.

existentials and generics are not great for small POD types like Point2 or whatever. but i've found that they are excellent for abstracting over more complex data, especially if you can constrain them to AnyObject.

That's true. But with static polymorphism, the performance is reliable and if your data structures are not intended to be big (e.g. statically-typed errors), then the padding wouldn't be too big. For embedded systems, you'd probably make heavy use of static storage in the executable, which would make the padding largely irrelevant.

With existentials you do get to save some memory by avoiding the padding, but you pay for it by dynamic allocation. In this context, it comes down to low time complexity versus low space complexity. Both options have to be there, because both options are critical in different cases.

For instance, in audio processing, memory is plentiful, but responsiveness is critical, so you'd easily be willing to pay for the speed with some potentially big padding.

On the other hand, for a bare-metal microprocessor code that is working with extremely limited resources, you may need to squeeze every bit out of the available memory, so you wouldn't be able to afford to use such polymorphism much anyway.

I don’t think that’s quite correct…? Both some P and Foo<P> provide polymorphism that is resolved at compile time.

only with respect to members of P, requirements of P will still dispatch to the conforming type’s witness at run-time, unless the compiler can specialize it to a known type.