@Douglas_Gregor wrote the Generics Manifesto almost three years ago, which provided the roadmap for the core of Swift's protocols and generics features. Since then, we've implemented almost all of the basic model that it envisioned. Although there are further features we could add, we think we have a solid baseline; meanwhile, we've gained practical experience with the model in developing the standard library, we've absorbed lots of feedback from the larger Swift development community, and we've kept an eye on the parallel evolution of other programming languages. This document tries to provide a foundation for conversations about refining the generics model, not really changing the framework established by the Generics Manifesto, but considering some of its weaknesses, and how we might make it more approachable and easier to use:
- One of the biggest missing pieces from the original manifesto is generalized existentials. These have been hailed as a panacea for a wide range of problems, but as we've explored the idea, we've found that there are many use cases that existentials would never be able to address.
- In particular, although existentials would allow functions to hide their concrete return types behind protocols as implementation details, they would not always be the most desirable tool for this job. We have a gap in the generics model in allowing functions to abstract their concrete return types while still maintaining the underlying type's identity in client code, and we'll look at how that gap can be filled.
- We'll also look at our existing notation for generics and existentials. Swift follows in the tradition of similar languages like C++, Java, and C# in its generics notation, using explicit type variable declarations in angle brackets, but this notation can be verbose and awkward. We could look at what C++20 is doing with abbreviated templates, and Rust with its
impl Trait
feature, for ways to make writing generic functions more concise and fluent. Also, protocols currently do double-duty as the spelling for existential types, but this relationship has been a common source of confusion.
A lot of this thinking was scattered in the pitch and review threads for SE-244, opaque result types. The core team thought that it would be a good idea to consolidate these ideas into one document to try to pave a coherent way forward for improving the UI of Swift's existing generics model.
Type-level and value-level abstraction
Let's start by reviewing the capabilities of generic types and existentials. Generics provide type-level abstraction: they allow a function or type to be used uniformly with any type that conforms to a given set of constraints, while still preserving the identity of the specific type being used in any particular instance. A generic function introduces type variables that stand in for a specific type. This allows a function to declare that it accepts any value conforming to a protocol, such as any Collection type:
func foo<T: Collection>(x: T) { ... }
The type variable T
abstracts away the specific Collection
being operated on at the type level; the identity of the type is still preserved, and so the type system can preserve type relationships between different values. For example, a generic function can also declare that it accepts any two values of the same collection type, and returns an array whose elements are of that same type:
func bar<T: Collection>(x: T, y: T) -> [T] { ... }
Swift also has existential types, which provide value-level abstraction. By contrast with a generic type parameter, which binds some existing type that conforms to the constraints, an existential type is a different type that can hold any value of any type that conforms to a set of constraints, abstracting the underlying concrete type at the value level. Existentials allow values of varying concrete types to be used interchangeably as values of the same existential type, abstracting the difference between the underlying conforming types at the value level. Different instances of the same existential type can hold values of completely different underlying types, and mutating an existential value can change what underlying type the value holds:
let x: Collection = [1, 2, 3]
let y: Collection = Set("one", "two", "three")
var z = x
z = y
z = ["one": 1, "two": 2, "three": 3]
(Swift currently doesn't allow Collection
to be used an existential type, nor does it allow existentials to put constraints on associated types, but these would be natural language extensions.)
One of the key things that existentials allow, because an existential type is a distinct type, is that they can in turn be used to parameterize other types, which is particularly useful for heterogeneous collections:
let xyz: [Collection] = [x, y, z]
Some generic functions can instead be expressed using existential types for arguments. The function foo
above which takes any value conforming to Collection could alternatively be phrased as:
func foo(x: Collection) { ... }
which is notationally clearer and more concise than the generic form.
Limitations of value-level abstraction
Because existentials eliminate the type-level distinction between different values of the type, they cannot maintain type relationships between independent existential values. If the function bar
above were written:
func bar(x: Collection, y: Collection) -> [Collection] { ... }
it's still more concise than the generic form, but it loses a lot of type information: x
and y
are no longer required to have the same concrete type, nor are the returned array's elements guaranteed to have the same dynamic type as x
and y
. With a protocol like Collection
that has associated types, this has knock-on effects, since if the values aren't known to be of the same Collection type, it also isn't known whether their Index
or Element
types are the same, making almost all of the collection API type-unsafe on such an existential type:
// not type-safe, since the existential Comparable might not match the existential Collection's Index type
var start = x.startIndex
// somebody could do this and change `start`'s dynamic type:
// start = y.startIndex
var firstValue = x[start] // error
// also not type-safe, since the existential Any might not match the existential Collection's Element type
var first = x.first!
var indexOfFirst = x.index(of: first) // error
Existentials are hampered by the lack of three key features in Swift today, but although these features would push the frontier forward of what existentials can do, they won't ever quite reach the full power that generic types have. Let's first say that existentials should be able to constrain associated types. This would let the function declare that the input collections and returned array's contents all share an element type:
typealias CollectionOf<T> = Collection where Self.Element == T
func bar<T>(x: CollectionOf<T>, y: CollectionOf<T>) -> [CollectionOf<T>] { ... }
This nonetheless still doesn't give you full access to the underlying collection's API, since the Index types are still not known to be the same; the type system would still, without further language support, refuse to allow code working with an existential CollectionOf<T>
to index itself. Having an existential type constrain Index to a specific implementation would make the existential nearly useless due to the tight coupling between collection implementations and their indexes. The next language extension for generalized existentials might be, for special cases like collection indexing, to add a feature that allows a protocol to describe its existential "self-conformance". For instance, the manually implemented AnyCollection<T>
type erasure container does this with its AnyIndex<T>
type; in the future, we could allow the existential to express this directly:
extension Collection: Collection {
// Support indexing an existential Collection with a dynamically-typed index
subscript(index: Comparable) -> Any {
// Indexing has a dynamic precondition that the argument be a valid index for the current collection.
// A valid index for an existential Collection would have to dynamically match the corresponding Index
// type, so we can check for this with a cast:
return self[index as! Self.Index]
}
}
Finally, we could support "opening" an existential type, allowing the dynamic type of a value to be reintroduced as a local type. This would allow for a computation involving a single existential collection's index and element values to be performed:
let <X: Collection> openedX = x // X is now bound to the dynamic type of x
let start = openedX.startIndex
let first = openedX[start] // OK, indexing X with a value of type X.Index, to get a result of type X.Element
However, even with all of these features, the best we could hope for is that an existential is able to support computations derived from a single existential value. This is still useful, and well suited to the nature of heterogeneous data structures, but when multiple related values are involved, existentials can't match what can be achieved by maintaining abstraction at the type level.
Type-level abstraction is missing for function returns
Generics are Swift's tool for type-level abstraction in function interfaces, but they work in a way that is fundamentally in the caller's control. If a function is declared like this:
func zim<T: P>() -> T { ... }
Then this says that zim
can return a value of any type that the caller chooses :
let x: Int = zim() // T == Int chosen by caller
let y: String = zim() // T == String chosen by caller
However, it's common to want to abstract a return type chosen by the implementation from the caller. For instance, a function may produce a collection, but not want to reveal the details of exactly what kind of collection it is. This may be because the implementer wants to reserve the right to change the collection type in future versions, or because the implementation uses composed lazy
transforms and doesn't want to expose a long, brittle, confusing return type in its interface. At first, one might try to use an existential in this situation:
func evenValues<C: Collection>(in collection: C) -> Collection where C.Element == Int {
return collection.lazy.filter { $0 % 2 == 0 }
}
but Swift will tell you today that Collection
can only be used as a generic constraint, leading someone to naturally try this instead:
func evenValues<C: Collection, Output: Collection>(in collection: C) -> Output
where C.Element == Int, Output.Element == Int
{
return collection.lazy.filter { $0 % 2 == 0 }
}
but this doesn't work either, because as noted above, the Output
generic argument is chosen by the caller—this function signature is claiming to be able to return any kind of collection the caller asks for, instead of one specific kind of collection used by the implementation.
The standard library AnyCollection<Int>
wrapper could instead be used to hide the return type, or in future Swift, a generalized existential:
func evenValues<C: Collection>(in collection: C) -> CollectionOf<Int>
where C.Element == Int
{
return collection.lazy.filter { $0 % 2 == 0 }
}
This gives us value-level abstraction of the return type, achieving the goal of hiding the underlying collection used by the implementation. However, this is also not as precise as we could be. The evenValues(in:)
function returns the same collection type every time it's called, and that information is lost. In Swift today, there's no way for an implementation to achieve type-level abstraction of its return values independent of the caller's control. There's effectively a hole in the feature matrix for method API design in Swift:
Arguments | Returns | |
---|---|---|
Value-level abstraction | Existentials | Existentials |
Type-level abstraction | Generic arguments | ???? |
meaning that, if an API wants to abstract its concrete return type from callers, it must accept the tradeoffs of value-level abstraction. If those tradeoffs are unacceptable, the only alternative in Swift today is to fully expose the concrete return type.
"Reverse generics" for return type abstraction
To achieve type-level abstraction of a return type, we would need to introduce a new type system feature: something that behaves similar to a generic parameter type, but whose underlying type is bound by the function's implementation rather than by the caller. This is analogous to the roles of argument and return values in functions; a function takes its arguments as inputs and uses them to compute the return values it gives back to the caller. We could think of type-level-abstracted return types as doing the same thing but at the type level; you give a function generic arguments as inputs, and it gives a certain return type back. We could notate this as a second generic signature to the right of the return arrow:
func evenValues<C: Collection>(in collection: C) -> <Output: Collection> Output
where C.Element == Int, Output.Element == Int
{
return collection.lazy.filter { $0 % 2 == 0 }
}
@orobio called this "reverse generics", and the term is apt. Inside the body of evenValues
, the C
generic parameter type represents a specific type, albeit one that isn't known beyond being something that conforms to Collection
with an Element
type of Int
, so the body can only use members of Collection
on the collection
argument value. Likewise, to the caller of evenValues
, the return type Output
represents a specific type, unknown to the caller except that it conforms to Collection
with an Element
type of Int
. Because the declaration of evenValues
references a specific return type, the type identity of that return type can be preserved across values from different calls to the function while maintaining the abstraction. The entire interface of Collection "just works" on returned values, and Index and Element values can be shared between the results of different calls. In effect, the roles of caller and callee relative to the generic interface are reversed compared to generic arguments. This nicely fills in the hole in the feature matrix, taking the existing generic type system framework and allowing it to be deployed in a context it formerly wasn't available.
The notation also generalizes to more interesting examples, if we allow the "reverse" generic signature to use the full set of existing generics features. For instance, we could describe that a function returns two collections of the same underlying type:
func groupedValues<C: Collection>(in collection: C) -> <Output: Collection> (even: Output, odd: Output)
where C.Element == Int, Output.Element == Int
{
return (even: collection.lazy.filter { $0 % 2 == 0 },
odd: collection.lazy.filter { $0 % 2 != 0 })
}
Improving the notation for generics
Type-level abstraction with generics is powerful, but in their current form, there's no denying that they are syntactically complex, and many people find them conceptually difficult. Brent Simmons coined the term "angle bracket blindness" in an early critique of Swift. In cases where a function can be expressed using either generics or existentials, the existential form is visually and conceptually simpler:
func foo<T: Collection, U: Collection>(x: T, y: U) -> <V: Collection> V
func foo(x: Collection, y: Collection) -> Collection
The existential form is clearer not just because it's shorter, but because it's also more direct in communicating the function's interface. Using generics requires introducing a new set of names for the type parameters, introducing a level of notational indirection; in the generic form, you have to first associate the arguments x
and y
with their respective types T
and U
, and then look to those type variables in the angle brackets to see what constraints apply. In the existential form, by contrast, the constraints apply directly to the values. In more elaborate generic signatures, the type variable names play an important role in being able to describe generic constraints in their full generality, but in simple cases like this, where there's a one-to-one mapping between values and type variables, the indirection arguably doesn't pay for itself.
The way that Swift describes more involved generic signatures also puts a lot of syntactic distance between concrete functions and their generalizations, making the natural process of writing an algorithm in terms of a specific model and then gradually generalizing it more difficult and sometimes more confusing than it could be. Let's consider a simple function that concatenates two arrays of Int:
func concatenate(a: [Int], b: [Int]) -> [Int] {
var result: [Int] = []
result.append(a)
result.append(b)
return result
}
It's easy enough to generalize this to work with arrays of any type, by introducing a generic argument to stand in for Int
:
func concatenate<T>(a: [T], b: [T]) -> [T] {
var result: [T] = []
result.append(a)
result.append(b)
return result
}
but what happens if we want to generalize further, to concatenate any two collections? We have to change the generic signature drastically:
func concatenate<A: Collection, B: Collection>(a: A, b: B) -> [A.Element]
where A.Element == B.Element
{
var result: [A.Element] = []
result.append(a)
result.append(b)
return result
}
Going another step, maybe we want to take advantage of reverse generics to hide the concrete return type too, so we can use a lazy adapter instead of eagerly concatenating into an array:
func concatenate<A: Collection, B: Collection>(a: A, b: B) -> <C: Collection> C
where A.Element == B.Element, B.Element == C.Element
{
return ConcatenatedCollection(a, b)
}
At this point, the generalized declaration is no longer clearly visually related to its original concrete form.
Expressing constraints directly on arguments and returns
Other languages have noted these ergonomics issues with their own generics systems, including C++ and Rust, and we can learn from the solutions they've proposed and adopted to address these ergonomics problems. C++20 introduced abbreviated function templates, which allow templated function definitions to be written with auto
arguments instead of independent type parameters, and with concepts (C++'s rough analog to Swift protocols) directly specified on those auto parameters:
// C++
template<typename T> void foo(T x) { }
template<Regular T, Regular U> void bar(T x, U y) { }
// can be shortened in C++20 to:
void foo(auto x) { }
void bar(Regular auto x, Regular auto y) { }
Rust similarly has the impl Trait
syntax, which can be used in either argument types, where it implicitly introduces a generic argument, or in return types, where it behaves like a "reverse generic" to abstract away part of the function's concrete return type:
// Rust
struct Concatenated<T, U> { ... }
impl<T, U> Iterator for Concatenated<T, U> { ... }
fn concat<T: Iterator, U: Iterator>(x: T, y: U) -> Concatenated<T, U> { ... }
// can also be expressed as:
fn concat(x: impl Iterator, y: impl Iterator) -> impl Iterator { ... }
This style of generic function definition addresses many of the ergonomic issues with the traditional notation. It avoids introducing the notational indirection of type variables when there's nothing to gain from them, allowing constraints to be expressed directly on arguments and returns, without having to use existential types. In simple cases, the angle bracket blinders go away completely, and we still get all of the benefits of type-level abstraction.
Swift would do well to follow this design trend. Aesthetically, Swift prefers real words to reclaimed archaisms like auto
or abbrvs like impl
, and since we already use Any
as a marker for type erasers and other existential-adjacent thing, I think that some
would be a good modifier to use. It's concise, and has the right connotation of representing some unspecified yet unique type. This would look something like this:
func concatenate(a: some Collection, b: some Collection) -> some Collection
which is much less intimidating. However, there's an important piece still missing: the where
clause constraints on the Element
types of these collections. Swift's where
clause syntax is very expressive, but it relies on having type names to describe constraints. For protocols like Collection, you almost always want to know something about its Element when you have a type constrained to it, and it would be unfortunate to force naming the type only to be able to do so. There are a number of different directions we could go here. For one, we could allow the argument value names to be used in place of type variables, with maybe a placeholder name like return
for the unnamed return value:
func concatenate(a: some Collection, b: some Collection) -> some Collection
where type(of: a).Element == type(of: b).Element,
type(of: return).Element == type(of: b).Element
or we could look at Rust again, which has a convenient Trait<AssocType = T>
notation for constraining associated types. An analogous feature in Swift might allow simple associated type constraints to be expressed without a where clause at all:
func concatenate<T>(a: some Collection<.Element == T>, b: some Collection<.Element == T>)
-> some Collection<.Element == T>
One nice thing about this notation is that, if we look again at the array-specific signature:
func concatenate<T>(a: [T], b: [T]) -> [T]
then the syntactic analogy between the more specific and more generic forms is now clear: the generic form substitutes the array type syntax [_]
one-to-one with some Collection<.Element == _>
, so the generalization relationship is more obvious. Also, since this syntax doesn't need to name the type being constrained at all, it could be applied to writing generalized existentials too, which have the same problem of needing to describe a type without a name.
Clarifying existential types
We gave existential types an extremely lightweight spelling, just the bare protocol name, partially following the example of other languages like Java and C# where interfaces also serve as value-abstracted types, and partially out of a hope that they would "just work" the way people expect; if you want a type that can hold any type conforming to a protocol, just use the protocol as a type, and you don't have to know what "existential" means or anything like that. In practice, for a number of reasons, this hasn't worked out as smoothly as we had originally hoped. Although the syntax strongly suggests that the protocol as a constraint and the protocol as a type are one thing, in practice, they're related but different things, and this manifests most confusingly in the "Protocol (the type) does not to conform to Protocol (the constraint)" error. To some degree this is a missing feature in the language, but in cases with nontrivial Self or associated type constraints, a protocol existential simply can't conform to its own protocol. This is another place where Swift might do well to follow in Rust's footsteps: Rust also originally spelled its analogy to existential types as bare Trait
, but later introduced a keyword, dyn Trait
, to make the fact that an existential type is being used explicit. If Swift did the same, we might use any
:
// A variable that can hold any collection at all
var x: any Collection = [1, 2, 3]
x = Set("foo", "bar", "bas")
// A variable that can hold collections of Int only
var y: any Collection<.Element == Int> = [1, 2, 3]
y = Set(4, 5, 6)
By syntactically separating the spelling for the existential type from the protocol, it hopefully becomes clearer that they're different things, and it can be easier to describe the differences and why they exist. This also opens other syntactic avenues for other important generalized existentials features. Since extension Protocol
already means "extend all types that conform to Protocol", there's currently no obvious syntax for extending only the existential type, which would be useful for being able to describe how an existential type conforms to its protocol if it does so in a nontrivial way. If existentials have explicit "any" syntax, then that could also be used to explicitly extend them:
extension any Hashable: Hashable {
static func ==(a: any Hashable, b: any Hashable) -> Bool {
return AnyHashable(a) == AnyHashable(b)
}
func hash(into: inout Hasher) {
AnyHashable(self).hash(into: &into)
}
}
Moving forward
There's a lot of material here, and if we were to move forward in this direction, there's a lot of design discussion and implementation as well. This document aims to sketch a path forward, but it isn't necessarily the final word on how or even if we address the issues this document raises. It would be a good start to first discuss this document at a high level. From that point, as a strawman, here's a rough breakdown of how we could factor evolution in manageable chunks. To begin with, a particularly useful cross-section of functionality is captured in SE-244, opaque result types, which introduces the some Protocol
syntax for type-abstracted return types. This roughly follows the progression of impl Trait
in Rust, where it was first introduced only for return types, then was generalized to be able to appear structurally in both argument and return types. We think this is a reasonable first step because it directly addresses the biggest functionality gap in the generics model. After that first step, there are a few fairly orthogonal language change discussions we can have, some of which are already underway:
- Generalizing the
some
syntax to arguments and returns - Generalizing return type abstraction with "reverse generics" notation
- Introducing
Protocol<.Assoc>
shorthand for constraints - Making existential types require an explicit keyword. This one is particularly interesting because of the source compatibility concerns. Some previous discussion of this point came up in a thread about lifting the "self or associated type" constraint on existential types.
With the core generics model in place, and major technical foundations like the stable ABI completed, we should also stay vigilant for other opportunities to learn from the collective experience of using Swift to iterate on and refine the language, continuing to make Swift easier to learn and use.