Improving the UI of generics

steve · April 17, 2019, 10:22pm

Honestly, I think we're discussing the wrong problem with protocol generics. We agree that the type system should support generic protocol existentials without type erasure, but this discussion has been mostly centered around making their associated types existential as well—being able to declare, "I want any instances of this protocol, each with any instances of its associated types", such as with AnyHashable—whereas I think it'd be more useful and pratical to start with (what I'm going to call because I don't know the proper term) "fully qualified" associated types—being able to declare "I want any instances of this protocol, each with these specific implementations for its associated types", such as with AnySequence<T>. The latter situation is something I find myself needing much more often than the former.

All the examples described above can be more easily described with these existential protocols with universal associated types, and we can later add syntatic sugar to "simplify" the syntax to something more akin to what's being discussed.

I'm going to demonstrate this concept using the syntax Sequence«Element: T» to denote the proposed non-type-erased equivalent of AnySequence<T>. I am not advocating for the use of this syntax, I'm just using it as an arbitrary notation for my examples.

One of the first examples of a proposed generic protocol syntax is:

func bar(x: Collection, y: Collection) -> [Collection] { ... }

The problems with this syntax, it's argued, is that the Element and Index associated types are not necessarily the same for both parameters passed into bar(x:y:) as well as the return type of the function itself.

Another example is then used to show that partially constraining the protocol to a specific Element type still leaves the Index type unconstrained, leaving us with the same problem.

typealias CollectionOf<T> = Collection where Self.Element == T

func bar<T>(x: CollectionOf<T>, y: CollectionOf<T>) -> [CollectionOf<T>] { ... }

This problem can be solved by requiring existential protocols have "fully qualified" associated types:

func bar<T, U>(x: Collection«Element: T, Index: U», y: Collection«Element: T, Index: U»)
    -> Collection«Element: T, Index: U»

Here, x, y, and the return value all have the same Element and Index types. I'd also argue that Sequence would be better suited for this specific example, but I understand it's intended as a demonstration with existential associated types.

Sidenote: The way I'm imagining this system, the remaining associated types Indices, Iterator, and SubSequence inherit that "full-qualification"—resolving Indices as Collection«Element: U, Index: U», Iterator as IteratorProtocol«Element: T», and SubSequence as Sequence«Element: Element» (with its Iterator resolved to IteratorProtocol«Element: Element») respectively. I notice that this specific example recursively references Collection—causing an infinite type definition. Perhaps this sort of "qualified" type wouldn't be allowed until the compiler is able to reason that the qualified type of Indices.Indices is equivalent to the type of Indices, and can handle it somehow. I'm not sure how the type system is implemented, so maybe this won't ever be possible, but I think this behavior should at least be valid for non-recursive types like Sequence.

The second section discusses Swift's ability to allow the caller of a function to specificy its return type.

func zim<T: P>() -> T { ... }

let x: Int = zim() // T == Int chosen by caller

let y: String = zim() // T == String chosen by caller

It's argued that existential generic protocols can't allow for this behavior because the caller can't specify the Element of a returned Collection without first defining the returned collection itself.

Note: I've adjusted the function signature of the following examples to use the BinaryInteger protocol instead of the Int structure. This will let the caller better define the return type later on.

func evenValues<C: Collection, I: BinaryInteger>(in collection: C) -> Collection
    where C.Element == I, I: ExpressibleByIntegerLiteral {
        return collection.lazy.filter { $0.isMultuple(of: 2) }
}

let x = evenValues(in: [1, 2, 3, 4]) // What is type(of: x).Element?

I think it should be noted that these specific examples shouldn't technically work since lazy is defined on LazyCollectionProtocol, not Collection, so I've modified the following examples to take a LazyCollectionProtocol parameter instead.

The second example illustrates what it would look like for the caller to specify the return type themselves.

func evenValues<C: LazyCollectionProtocol, I: BinaryInteger, Output: Collection>(in collection: C) -> Output
    where C.Element == I, Output.Element == I, I: ExpressibleByIntegerLiteral {
        return collection.lazy.filter { $0.isMultiple(of: 2) }
}

let x: LazyFilterSequence<[Int]> = evenValues(in: [1, 2, 3, 4]) // Wait... Why do I need to know the return type?

The proposal for opaque result types attempts to address this issue, but it can only be resolved to a specific underlying type, which isn't the existential behavior we're looking for here.

"Fully qualified" associated types can be used here to address both problems—needing to know the return type, and having non-existential behavior for opaque result types.

func evenValues<C: LazyCollectionProtocol, I: BinaryInteger>(in collection: C) -> Collection«Element: I, Index: C.Index»
    where C.Element == I, I: ExpressibleByIntegerLiteral {
        return collection.lazy.filter { $0.isMultiple(of: 2) }
}

let x: Collection«Element: Int, Index: Int» = evenValues(in: [1, 2, 3, 4]) // Don't care what it is as long as has Ints

let y: Collection«Element: UInt, Index: Int» = evenValues(in: [5, 6, 7, 8]) // Don't care what it is as long as it has UInts

A proposed syntax related to the opaque result type is based on Rust's impl keyword.

func concatenate(a: some Collection, b: some Collection) -> some Collection { ... }

If I'm understanding the intent behind this syntax correctly, I think this would just be sugar for the following "fully qualified" example.

func concatenate<T, U>(a: Collection«Element: T, Index: U», b: Collection«Element: T, Index: U»)
    -> Collection«Element: T, Index: U» { ... }

Which can be read as, "Give me any two (possibly different) Collection implementations that both have Elements of T and Indexs of U, and then I'll return some other Collection implementation with both of those associated types as well."

@Joe_Groff, you had this to say about the need to include both the Element and Index in the existential type of Collection:

Swift's design is aimed at enabling more a expressive type system to capture more interesting type-level relationships between values. The C# design would become more cumbersome if you tried to implement something like Swift's Collection hierarchy in it, since you'd need to define a type ICollection<Index, Element> and carry the index around with you everywhere. The type relationship between collections and indexes is what allows Swift's collections to approach "zero cost" in specialized code, since for instance, you know an Array is always indexed by Ints, and that a String is always indexed by valid code unit offsets represented by String.Index. Although you could express that relationship in C#, it would make ICollection not very useful as a dynamic interface type, since the Index generic argument is usually specific to a single collection family, so for instance ICollection<Int, T> would effectively be a type that can only hold Arrays. By using associated types, Swift allows you to express relationships between Collections using only the relevant associated types; you only need to refer to Index when indexing. With more flexible existential types, you'd also be able to refer to any Collection<.Element == T> to abstract over collections of a certain element type without confining yourself to a specific index. The goal of associated types is to allow for greater flexibility and expressivity, admittedly at the cost of some shorter-term awkwardness since we're missing so many key features still.

I am of the opinion that keeping a Collection without knowing its Index excludes its use for Collection-specific functionality—which is mostly centered around the ability to index its elements. It would be more appropriate to keep a Sequence in this case. Since there wouldn't be any type-erasure, it'll be possible for some client to inspect its type and cast it back into a Collection if they really wanted to get that indexing behavior back.

I'm not saying that using a Collection without knowing its index is never appropriate—there are definitely legitimate use cases, even if I can't think of one off the top of my head. I just think that there are more use cases where not needing to know the underlying implentation of a protocol and needing it to have specific associated types is more common. This is definitely the case in my experience.

tl;dr I really think we should be focusing on creating a non-type-erased "Any*" type along the lines of AnySequence<T> before introducing non-type-erased existential protocols such as AnyHashable.

PS: If you know the formal terms for anything I talked about, let me know! Doubly so if I misunderstand some type theory concept or basic axiom of the Swift type system or philosophy.