[Pitch] Primary Associated Types in the Standard Library

Since it is mostly of-topic, I suggest asking this question in a new thread on Evolution/Discussion rather than here.

Excited to see this!

Personally, I'd rather we not add any primary associated types to LazySequenceProtocol and LazyCollectionProtocol. They aren't likely to be used often, and it feels odd to have them be based on Elements rather than Element like the other sequence/collection protocols. We could always add them later if compelling use cases show up in sufficient quantity.

The distributed actors runtime proposal added a few protocols that could benefit from this feature. @ktoso should weigh in here as well, but my take is:

  • DistributedActor<ActorSystem>
  • DistributedActorSystem<SerializationRequirement>: this protocol has several associated types, but I anticipate any DistributedActorSystem<Codable> to be a Very Useful Thing.
  • DistributedTargetInvocationEncoder<SerializationRequirement>
  • DistributedTargetInvocationDecoder<SerializationRequirement>
  • DistributedTargetInvocationResultHandler<SerializationRequirement>

Definitely the right call. We should bring this in when we've dealt with the Error type, rethrowing conformances, and all of the other exciting throwing-ness of these protocols.

Doug

5 Likes

Thanks for the ping @Douglas_Gregor! I had skimmed this proposal and work but somehow didn't connect the dots all the way.

Yes, those look exactly right! :100:

The two "visible by users" protocols especially will benefit a ton from this, because right now half of the methods on the cluster have to be annotated with actor system requirements like this:

    public func termination<Watchee>(
        of watchee: Watchee,
        ...
    ) where Watchee: DistributedActor, Watchee.ActorSystem == ClusterSystem {

which is very annoying and also makes it tricky to store them. With primary associated types this'll be:

  public func termination<Watchee>(
        of watchee: Watchee,
        ...
    ) where Watchee: DistributedActor<ClusterSystem> {

or maybe just any DistributedActor<ClusterSystem> in some places...

The DistributedActorSystem indeed has many associated types, but the only ones which matter to outside users are:

  • SerializationRequirement - the type arguments are checked for
  • ActorID -- equal to the DistributedActor.ID that it assigns

The ID isn't really all that interesting when passing around a system as any DistributedActorSystem I think, and we can always use where clauses if so. But the any DistributedActorSystem<Codable> seems to be the most useful...

Summary:

  • the list Doug provided seems right and this'll be very useful
    • DistributedActor<ActorSystem> (implies DistributedActor.ID as well)
    • system implementation internal protocols:
    • DistributedTargetInvocationEncoder<SerializationRequirement>
    • DistributedTargetInvocationDecoder<SerializationRequirement>
    • DistributedTargetInvocationResultHandler<SerializationRequirement>
  • Pending question:
    • I have to make sure about DistributedActorSystem if we need the ActorID as primary as well or not...
    • it might simplify mocking out actor systems I think...? Will look into this.
3 Likes

Why not just?

public func termination(of watchee: some DistributedActor<ClusterSystem>)

Is there some benefit of the (traditional) expanded generics syntax that is desirable to use in core libraries, or was it just habit to reach for it?

1 Like

Out of habit here, not for some specific reason, not yet used to the new some/any syntaxes available -- I'll give it all a proper look and I think we'll include the distributed type amendments in this proposal as Karoy intends to propose it for review soon :slight_smile:

4 Likes

Interacting with a clock seems to require knowledge of its Instant.Duration, right? Otherwise you have no way of referring to an advanced instant, or how to tell it to sleep to a certain instant.

So it sounds appropriate for Clock's primary associated type to be this associated type of its Instant. Similar to how Collection<Element> shares its primary associated type with its Iterator.

Having Clock<Duration> would allow you to tell an arbitrary clock to sleep so long as you know its primary associated type, which is particularly useful for writing testable, time-based code, where your application logic can be injected with some Clock<Duration>: it can take a ContinuousClock when run in release, and a theoretical TestClock that can be manually advanced in your tests.

In Combine Schedulers we needed to parameterize a type-erased scheduler (AnyScheduler) and TestScheduler over the SchedulerTimeType in order to inject testable schedulers into application code. It would seem that without a primary associated type on Clock, anyone that would want to do something similar would need to still resort to bespoke wrapper types, like AnyClock.

4 Likes

I spoke at length about this with @lorentey. I retract the objection that it isn't capable of being a primary type. Instead I have a feeling that will only be needed/wanted in very specific scenarios; more often than not it will be folks using a concrete Clock type. There isn't any reason to not do it, only very boutique utility for doing it.

We already have a test clock system setup: swift-async-algorithms/Clock.swift at main · apple/swift-async-algorithms · GitHub. That seems to work pretty darned well for that specific use case; it just means that algorithms should just take generalized clocks if they plan on supporting any clock.

1 Like

I took a look at this, but it only seems appropriate for testing small algorithms/operators, and not larger systems that interact with time-based code. As soon as you are operating with a larger system over time (like testing an observable object in a SwiftUI app) it will need to hold onto a clock that is erased in some way.

I understand that testing can often be an afterthought, but by not providing a primary associated type here, anyone that wants to write testable, time-based code for a larger feature will be forced to write a concrete AnyClock wrapper.

7 Likes

+1 for Clock<Duration>. :slightly_smiling_face:

By the way, the Point Free episode on controlling time in tests is simply amazing @stephencelis. Crème de la crème!

3 Likes

FWIW, this does work:

enum Foo {
  typealias Bar = _Foo_Bar
}

protocol _Foo_Bar<T> {
  associatedtype T
}

struct G<X : Foo.Bar<Int>> {}
3 Likes

Clock<Duration> would be a fine choice; unfortunately, protocol Clock does not have an associated type called Duration. It only has Instant.

It may not be too late to introduce one. A similar precedent is Sequence, which has an Element associated type even though in theory it could just be a typealias for Iterator.Element.

3 Likes

Is there any reason that couldn't be amended? Basically have a similar hierarchy to Collection/Sequence?

Collection<Element> where Iterator.Element == Element
Clock<Duration> where Instant.Duration == Duration

If not, would your original proposal allow for something like the following?

any Clock<any Instant<Duration>>

I'm not sure I can come up with a reason to parameterize the Instant (much like I can't come up with a reason to parameterize Collection<Iterator>), but this would at least be an improvement over no primary associated type :slight_smile:

Edit: Saw your update! +1 to the idea of introducing one. It makes more sense to me as a primary type, though I'd love to hear from folks that have use cases for parameterizing over the Instant instead.

1 Like

Yes, if this proves to be important, I think it may still be possible to add Clock.Duration via a small amendment to SE-0329. Cc @Philippe_Hausler @John_McCall

3 Likes

I'd be happy to provide some concrete examples to further motivate an amendment, if my earlier point wasn't clear. Just let me know!

3 Likes

I don't have a problem with allowing this adjustment if Philippe thinks it's a good idea; I wouldn't say it needs evolution approval.

An alternative approach would be to allow primary associated types to be paths, e.g. protocol Clock<Instant.Duration> { ... }, but that might not be a good idea, and it probably would need evolution approval.

8 Likes

I submitted these PRs to introduce Duration as an associated type requirement of Clock:

swift-evolution#1618 - Amend SE-0329 to add Clock.Duration
swift/main#42314 - [stdlib] Add Clock.Duration as an associated type requirement
swift/release/5.7#42316 - [5.7][stdlib] Add Clock.Duration as an associated type requirement

5 Likes

The Core Team agreed that this is an acceptable change to make retroactively without further review. Let's do it.

9 Likes

I don't want to unduly burden this pitch, but the Core Team would appreciate it if one of the outputs of this proposal was an update to the API design guidelines laying out how API authors should think about when to adopt a primary associated type.

The API design guidelines were originally established by the Evolution process (in SE-0023) in conjunction with a proposal to adopt them in the standard library (SE-0006). It seems sensible that proposals to adopt new features throughout the standard library, such as this one, are the logical time to establish general principles around adoption. Once you're doing that, it shouldn't be much of a stretch to draft an update to the API design guidelines.

Of course, there are also proposals (like SE-0279: Multiple trailing closures) that do not get widespread adoption in the standard library, but still ought to be discussed in the API design guidelines. We're still talking about how to fix that loophole. But in cases where there is significant adoption, this seems like the right way to do it.

21 Likes

Thanks very much everyone -- I updated the document integrating feedback above.

  1. I added a new section with my suggestions for API design recommendations. I found that John's heuristic about simple prepositions captures an important point, so I pulled it directly into the guidelines. Feedback is most appreciated! The new section is reproduced below for easy commenting.

  2. The document no longer proposes primary associated types for LazySequenceProtocol and LazyCollectionProtocol. I still think Elements is the right choice for these, but this is mostly based on speculation rather than actual use cases. I would like to add these back, but I'd love to see some actual example cases to confirm/reject this choice. Do you have code that constrains these protocols or would like to return an opaque result type that used them? Please share if you can!

  3. As discussed above, Clock now has Duration as its primary associated type. (SE-0329 has been amended to reflect the addition of Duration as an associated type, and the implementation has landed on both main and release/5.7.)


General API Design Guidelines

Primary associated types add a new facet to the design of protocols. For every public protocol with associated type requirements, we need to carefully consider which of them (if any) we want to mark as primary. On the one hand, we want to allow people to use the shorthand syntax whenever possible; on the other hand, we only get one chance to decide this: once a protocol gains a primary associated type annotation, most subsequent changes would be source-breaking.

  1. Let usage inform your design.

    If you are considering adding a primary associated type declaration to a preexisting protocol, then look at its existing clients to discover which associated types get typically mentioned in same-type requirements. Is there one particular type that is used overwhelmingly more than any other? If so, then it will probably be a good choice for the primary.

    For example, in the case of Sequence, use sites overwhelmingly tend to constrain Element -- Iterator is almost never mentioned in where clauses. This makes it fairly clear that Element is the right choice for the primary type.

    If you're designing a new protocol, think about which type people will most likely want to constrain. Sometimes it may not even be one you planned to have as an associated type!

    For example, protocol Clock in [SE-0329] initially only had Instant as an associated type. As it turns out, in actual use cases, people are far more likely to want to constrain Instant.Duration rather than Instant itself. Clocks tend to be far too closely coupled to their instants for it to serve as a useful constraint target -- some Clock<ContinuousClock.Instant> is effectively just a circuitous way of spelling ContinuousClock. On the other hand, some Clock<Swift.Duration> captures all clocks that measure elapsed time in physical seconds -- a far more useful abstraction. Therefore, we decided to add Clock.Duration for the express purpose to serve as the primary associated type.

  2. Consider clarity at the point of use. To prevent persistent confusion, people familiar with the protocol ought to be able to correctly intuit the meaning of a same-type constraint such as some Sequence<Int>.

    Lightweight same-type requirements share the same angle-bracketed syntax as generic type arguments, including the same limitations. In particular, the language does not support argument labels in such lists, which prevents us from clarifying the role of the type names provided. A type name such as Foo<Int, String> on its own provides no hints about the role of its generic arguments Int and String; likewise, it isn't possible to decipher the role of Character in a same-type requirement such as some Bar<Character>, unless the reader is already somewhat familiar with the protocol Bar.

    The best candidates for primary associated types tend to be those that have a simple, obvious relationship to the protocol itself. A good heuristic is that if the relationship can be described using a simple preposition, then the associated type will probably make a viable primary:

    • Collection of Int
    • Identifiable by String
    • SIMD of Float
    • RawRepresentable by Int32

    Associated types that don't support this tend to have a more complex / idiosyncratic role in their protocol, and often make poor choices for a primary associated type.

    For example, Numeric has an associated type called Magnitude that does sometimes appear in same-type constraints. However, its role seems too subtle and non-obvious to consider marking it as primary. The meaning of Int in some Numeric<Int> is unlikely to be clear to readers, even if they are deeply familiar with Swift's numeric protocol hierarchy.

  3. Not every protocol needs primary associated types. Don't feel obligated to add a primary associated type just because it is possible to do so. If you don't expect people will want to put same-type constraints on a type, there is little reason to mark it as a primary. Similarly, if there are multiple possible choices that seem equally useful, it might be best not to select one. (See point 2 above.)

    For example, ExpressibleByIntegerLiteral is not expected to be mentioned in generic function declarations, so there is no reason to mark its sole associated type (IntegerLiteral) as the primary.

  4. Limit yourself to just one primary associated type. In most cases, it's best not to declare more than one primary associated type on any protocol.

    While the language does allow this, [SE-0346] requires clients using the lightweight syntax to always explicitly constrain all primary associated types, which may become an obstacle. Clients don't have an easy way to indicate that they want to leave one of the types unconstrained -- to do that, they need to revert to classic generic syntax, partially or entirely giving up on the lightweight variant:

    protocol MyDictionaryProtocol<Key, Value> {
      associatedtype Key: Equatable
      associatedtype Value
      ...
    }
    
    // This function is happy to work on any dictionary-like thing
    // as long as it has string keys.
    func twiddle(_ items: some MyDictionaryProtocol<String, ???>) -> Int { ... }
    
    // Possible approaches:
    func twiddle<Value>(_ items: some MyDictionaryProtocol<String, Value>) -> Int { ... }
    func twiddle<D: MyDictionaryProtocol>(_ items: S) -> Int where S.Key == String { ... }
    

    Of course, if the majority of clients actually do want to constrain both Key and Value, then having them both marked primary can be an appropriate choice.

15 Likes

I believe exposing the Index associated type is a common use case (not as much as Element of course) at the moment of constraining a Collection generic type.

An alternative currently for avoiding large nested types when using lazy collections is using something like Any…Collection<T> which is constrained to AnyIndex without a clear way to maintain the underneath index type of the base collection.

If, for example, I wanted to give the user the ease to index with Int which is “convenient” I would prefer to let them use the collection as values[(values.endIndex - 4)…] instead of values[values.index(values.endIndex, offsetBy: -4)…]