[Pitch] Primary Associated Types in the Standard Library

John_McCall · April 14, 2022, 11:05pm

I don't want to unduly burden this pitch, but the Core Team would appreciate it if one of the outputs of this proposal was an update to the API design guidelines laying out how API authors should think about when to adopt a primary associated type.

The API design guidelines were originally established by the Evolution process (in SE-0023) in conjunction with a proposal to adopt them in the standard library (SE-0006). It seems sensible that proposals to adopt new features throughout the standard library, such as this one, are the logical time to establish general principles around adoption. Once you're doing that, it shouldn't be much of a stretch to draft an update to the API design guidelines.

Of course, there are also proposals (like SE-0279: Multiple trailing closures) that do not get widespread adoption in the standard library, but still ought to be discussed in the API design guidelines. We're still talking about how to fix that loophole. But in cases where there is significant adoption, this seems like the right way to do it.

lorentey · April 27, 2022, 6:36pm

Thanks very much everyone -- I updated the document integrating feedback above.

I added a new section with my suggestions for API design recommendations. I found that John's heuristic about simple prepositions captures an important point, so I pulled it directly into the guidelines. Feedback is most appreciated! The new section is reproduced below for easy commenting.
The document no longer proposes primary associated types for LazySequenceProtocol and LazyCollectionProtocol. I still think Elements is the right choice for these, but this is mostly based on speculation rather than actual use cases. I would like to add these back, but I'd love to see some actual example cases to confirm/reject this choice. Do you have code that constrains these protocols or would like to return an opaque result type that used them? Please share if you can!
As discussed above, Clock now has Duration as its primary associated type. (SE-0329 has been amended to reflect the addition of Duration as an associated type, and the implementation has landed on both main and release/5.7.)

General API Design Guidelines

Primary associated types add a new facet to the design of protocols. For every public protocol with associated type requirements, we need to carefully consider which of them (if any) we want to mark as primary. On the one hand, we want to allow people to use the shorthand syntax whenever possible; on the other hand, we only get one chance to decide this: once a protocol gains a primary associated type annotation, most subsequent changes would be source-breaking.

Let usage inform your design.

If you are considering adding a primary associated type declaration to a preexisting protocol, then look at its existing clients to discover which associated types get typically mentioned in same-type requirements. Is there one particular type that is used overwhelmingly more than any other? If so, then it will probably be a good choice for the primary.

For example, in the case of Sequence, use sites overwhelmingly tend to constrain Element -- Iterator is almost never mentioned in where clauses. This makes it fairly clear that Element is the right choice for the primary type.

If you're designing a new protocol, think about which type people will most likely want to constrain. Sometimes it may not even be one you planned to have as an associated type!

For example, protocol Clock in [SE-0329] initially only had Instant as an associated type. As it turns out, in actual use cases, people are far more likely to want to constrain Instant.Duration rather than Instant itself. Clocks tend to be far too closely coupled to their instants for it to serve as a useful constraint target -- some Clock<ContinuousClock.Instant> is effectively just a circuitous way of spelling ContinuousClock. On the other hand, some Clock<Swift.Duration> captures all clocks that measure elapsed time in physical seconds -- a far more useful abstraction. Therefore, we decided to add Clock.Duration for the express purpose to serve as the primary associated type.
Consider clarity at the point of use. To prevent persistent confusion, people familiar with the protocol ought to be able to correctly intuit the meaning of a same-type constraint such as some Sequence<Int>.

Lightweight same-type requirements share the same angle-bracketed syntax as generic type arguments, including the same limitations. In particular, the language does not support argument labels in such lists, which prevents us from clarifying the role of the type names provided. A type name such as Foo<Int, String> on its own provides no hints about the role of its generic arguments Int and String; likewise, it isn't possible to decipher the role of Character in a same-type requirement such as some Bar<Character>, unless the reader is already somewhat familiar with the protocol Bar.

The best candidates for primary associated types tend to be those that have a simple, obvious relationship to the protocol itself. A good heuristic is that if the relationship can be described using a simple preposition, then the associated type will probably make a viable primary:
- Collection of Int
- Identifiable by String
- SIMD of Float
- RawRepresentable by Int32
Associated types that don't support this tend to have a more complex / idiosyncratic role in their protocol, and often make poor choices for a primary associated type.

For example, Numeric has an associated type called Magnitude that does sometimes appear in same-type constraints. However, its role seems too subtle and non-obvious to consider marking it as primary. The meaning of Int in some Numeric<Int> is unlikely to be clear to readers, even if they are deeply familiar with Swift's numeric protocol hierarchy.
Not every protocol needs primary associated types. Don't feel obligated to add a primary associated type just because it is possible to do so. If you don't expect people will want to put same-type constraints on a type, there is little reason to mark it as a primary. Similarly, if there are multiple possible choices that seem equally useful, it might be best not to select one. (See point 2 above.)

For example, ExpressibleByIntegerLiteral is not expected to be mentioned in generic function declarations, so there is no reason to mark its sole associated type (IntegerLiteral) as the primary.
Limit yourself to just one primary associated type. In most cases, it's best not to declare more than one primary associated type on any protocol.

While the language does allow this, [SE-0346] requires clients using the lightweight syntax to always explicitly constrain all primary associated types, which may become an obstacle. Clients don't have an easy way to indicate that they want to leave one of the types unconstrained -- to do that, they need to revert to classic generic syntax, partially or entirely giving up on the lightweight variant:
```
protocol MyDictionaryProtocol<Key, Value> {
  associatedtype Key: Equatable
  associatedtype Value
  ...
}

// This function is happy to work on any dictionary-like thing
// as long as it has string keys.
func twiddle(_ items: some MyDictionaryProtocol<String, ???>) -> Int { ... }

// Possible approaches:
func twiddle<Value>(_ items: some MyDictionaryProtocol<String, Value>) -> Int { ... }
func twiddle<D: MyDictionaryProtocol>(_ items: S) -> Int where S.Key == String { ... }
```
Of course, if the majority of clients actually do want to constrain both Key and Value, then having them both marked primary can be an appropriate choice.

Michael_Ilseman · May 9, 2022, 7:21pm

Do you have any examples? I've written generic constraints hundreds of times and caring about the index type is exceedingly rare for me. Knowing or constraining the index type can also be a subtle bug, e.g. RandomAccessCollection<Foo, Int> for a RAC of Foo with Int indices doesn't mean that those indices are contiguous and can be advanced using integer arithmetic.

lorentey · May 10, 2022, 1:31am

I updated the pitch document & implementation to include the suggested declarations for distributed actors:

protocol DistributedActor<ActorSystem>: AnyActor, Identifiable, Hashable {...}
protocol DistributedActorSystem<SerializationRequirement>: Sendable {...}
protocol DistributedTargetInvocationEncoder<SerializationRequirement> {...}
protocol DistributedTargetInvocationDecoder<SerializationRequirement> {...}
protocol DistributedTargetInvocationResultHandler<SerializationRequirement> {...}

(Please double check this is really what we need -- in particular, DistributedActorSystem won't be able to adopt ActorID as a primary assoc.type once we ship a stdlib that declares ActorSystem as one.) @Douglas_Gregor @ktoso

If things look good, then I think this is ready for a proposal review.

Thank you very much everyone for the productive discussion!

ktoso · May 10, 2022, 2:56am

Thanks for the ping @lorentey

It seems the conclusion about this got lost or confused somewhere along the way between meetings, so here's a summary about distributed actors:

protocols in the Distributed module not going to adopt primary associated types right now,
we are interested in adopting primary associated types, but we first need:
- some more time to mature some related language features (to lift the restrictions about necessary concrete actor system types)
- and gain some more experience using the types" in the wild"

before we commit to the primary types.

Adding the primary associated types later on has little downside, as abstracting over them like this is not common (and actually... not possible, until we improve some more generics features). And adopting right now with the idea that we'll get it right and solve the missing language features later is actually a high risk we'd like to avoid.

Having that said, we did take explicit steps in the protocol design and synthesis that powers distributed actors such that all types the actor uses are expressed as associated types (rather than e.g. typealiases which we could have done in some places), so the types are future-proof to adopt primaries when we're ready to do so

We could adopt right now for the ...Encoder/...Decoder/...Handler, but since they are very much internal to actor system libraries and never really passed around or erased... we don't think it matters, and let's do them all together when we're ready.

Extra information on how and when Distributed would benefit from primary associated types -- for those curious:

Our end goal here is to be able to express the following:

protocol Greeter<SerializationRequirement>: DistributedActor<SerializationRequirement> {
  distributed func hi() 
}

let greeter: Greeter<Codable> // since e.g. my actor system is using Codable

Note that we're not specifying the actor system type but rather what it is using for serialization; and this way we can "swap in" a mock actor system without changing any code.

Sadly, this abstraction today is not possible and we would need to improve generics a little bit to support this in distributed actors. The reason is that some distributed actor system methods need to constrain generics using associated types, like for example (simplified): remoteCall<Success: SerializationRequirement> where SerializationRequirement is an associated type on the protocol.

Such constraint is not possible on today's Swift; and it forces us to always have a concrete ActorSystem at hand, so the ability that these primary associated types would give us, cannot be used because of other limitations in how the distributed calls work.

If we could we solve this type system limitation though... we would be able to allow these abstractions, and then everything will work as expected

As a minor side note: I am not yet sure if we need to expose ActorID as primary or not... likely not, but this again we'll learn in the coming months of using distributed actors "for real"

// edit: reworded a bit

John_McCall · May 10, 2022, 3:22am

Just to be clear as a language implementor, allowing abstraction over constraints like that is a very difficult problem which is way, way down the list if it’s even theoretically possible.

ktoso · May 10, 2022, 3:34am

Right I do not mean to imply it is going to happen, but that usefulness of abstracting over distributed actors is somewhat gated by such capability.

We were digging into this for quite a while and @xedin had some reasonable ideas how we could get a managable subset of such constraints implemented that would unlock enough expressive power for those cases... I digress and don't want to derail this thread more We should definitely catch up about this some time soon though, I'll be in touch

lorentey · May 10, 2022, 6:15pm

Makes sense -- I took out the distributed actor protocol additions!

John_McCall · May 18, 2022, 8:43pm

This proposal is now in review; closing this thread.