SE-0358: Primary Associated Types in the Standard Library

lorentey · June 1, 2022, 7:38am

(Highlights by me)

This is a valid analogue, but I think it's too narrowly applicable to be generally helpful. As I see it, there are two problems with it:

First, the role/meaning of T in FooImpl<T> isn't at all clear to me -- this type name alone includes no indication whether T stands for something like the Element type of an Array, the Base type of a Slice, the Root of a PartialKeyPath, or something completely different. Nor is this something that can be reasonably clarified through API design, as Swift provides no tools for API designers to clarify the role of such type arguments at the point of use. (Unlike with function invocations, where we can use mandatory argument labels to help clarify their meaning.)

So in my view, FooImpl<T> and BarImpl<T> do not meet any reasonable level of clarity at the point of use, and so therefore the premise of the implication quoted above is false -- the statement does not tell me anything useful about the clarity of FooAndBarProtocol<T>.
Second, this analogue does not seem to be applicable outside the Collection protocol hierarchy and its immediate neighborhood (such as AsyncSequence). General-purpose Collection types are indeed usually generic over their Element (or their base collection type, in the case of generic collection transformations, such as those conforming to LazyCollectionProtocol). However, this isn't true for specialized collections (such as String or Bitset), and it seems not to be true in general for protocols that aren't modeling containers/streams.

Identifiable, RawRepresentable, Clock, InstantProtocol do not seem rare exceptions to me -- rather, these feel like rather typical protocols outside the collection hierarchy. In my view, Strideable and OptionSet are firmly in this category as well.

(Of course, we could choose to limit the use of this new language feature to Collection-like protocols. However, that seems overly restrictive to me.)

In the specific case of Strideable, I don't see any issues about clarity at point of use whatsoever. On a superficial level, it would not be entirely unreasonable to say that Strideable.Stride feels too similar to Numeric.Magnitude for comfort. However, to me there is a very clear difference between these two protocols: the Strideable protocol consists entirely of operations taking or returning a Stride value, while Magnitude plays a far less crucial role in Numeric. Indeed, Strideable is every way as much a single-purpose protocol as Identifiable or RawRepresentable is -- all three of these protocols are built around their sole associated type, and even share the root of their name with it.

Identifiable types aren't usually generic over their ID; RawRepresentable types aren't typically generic over their RawValue, and Strideable types are rarely if ever generic over their Stride. I don't see how this indicates anything about whether or not we should allow using the lightweight constraint syntax with these protocols.

I believe I have already addressed the usefulness of constraining Stride.

In general, as long as there is a clear choice for a primary associated type, I think it's preferable to err on the side of defining one, rather than omitting it, even if the use case seems marginal. (Which I really don't think it is in Strideable's case.)

As we've seen with Clock, just because I can't think of an obvious use case for enabling the use of the lightweight syntax, it doesn't follow that nobody can. To me, the case for some Clock<Swift.Duration> only seemed obvious in hindsight. (Thanks @stephencellis for pointing it out!)

For what it's worth, the Standard Library does contain a number of useful algorithms on OptionSet that use a where RawValue: FixedWidthInteger clause; they supply default implementations to some core SetAlgebra requirements.

I don't see a reason to actively prevent this code from using the OptionSet<some FixedWidthInteger> syntax (whenever it becomes available for use in extensions).

As for usages outside of the stdlib, I can certainly imagine someone defining new algorithms, such as a method for getting the complement of an option set:

extension OptionSet<Int> {
  func complemented() -> Self {
    Self(rawValue: -1 ^ self.rawValue)
  }
}

(This may or may not be a wise thing to use, but I believe it does work.)

Or I can imagine that someone working on a new serialization facility might want to define generic methods that can work with any option set of a certain raw value:

extension MySerializer {
  func write(_ value: Int)
  func write(_ value: some OptionSet<Int>)
}

Granted, these examples have a distinct smell of desperation about them -- OptionSet is far more of a utility protocol than most of the other standard protocols.

For what it's worth, a quick code search on GitHub does readily uncover such RawValue constraints being used in actual real-life code.

Do we have a good reason to actively prevent people from using the new syntax for these cases? E.g., do you expect to repeatedly misunderstand OptionSet<Int> to mean a constraint on the Element type?

Hm; that is a remarkably high expectation for any set of API guidelines. Removing all subjective decisions seems like an unrealistic expectation, but perhaps we can tie things down a little more.

The "usefulness" guideline appears to be the weakest in this regard:

Not every protocol needs primary associated types. Don't feel obligated to add a primary associated type just because it is possible to do so. If you don't expect people will want to put same-type constraints on a type, there is little reason to mark it as a primary. Similarly, if there are multiple possible choices that seem equally useful, it might be best not to select one. (See point 2 above.) For example, ExpressibleByIntegerLiteral is not expected to be mentioned in generic function declarations, so there is no reason to mark its sole associated type ( IntegerLiteral ) as the primary.

I intended this merely as a way to carve out exceptions for cases like KeyedEncodingContainerProtocol, ExpressibleByIntegerLiteral, where there would be a clear, unambiguous choice for a primary associated type, but it seems pointless to define one, as these protocols are highly unlikely to be ever used in generic context.

Rather, this guideline has triggered arguments on whether it it'd be useful enough to allow lightweight constraint syntax on protocols where there are clearly some use cases for constraining the candidate primary type.

What if we reworded this requirement to something that provides less room for subjectivity?

Err on the side of defining a primary associated type. As long as there is a clear, unambiguously right choice for a primary associated type, and there is even a marginal use case for defining one, it's better to enable the lightweight constraint syntax rather than arbitrarily prohibit its use.

Conversely, do not feel obligated to select a primary if there are multiple possible choices that seem equally useful, or if the choice would lead to persistent confusion, or if the protocol isn't designed to be ever used in generic context. For example, ExpressibleByIntegerLiteral is not expected to be mentioned in generic function declarations, so there is no reason to mark its sole associated type (IntegerLiteral) as the primary, no matter how obvious a choice it would be.

That said, I do not wish to cut off discussions about requiring a higher level of usefulness, if fellow members of these fora feel it'd be worth exploring this a little more.

@xwu, @Karl, @benrimmington: You have recently raised objections of this nature -- do you feel it would be actively harmful to allow the lightweight syntax on unambiguous-but-perhaps-marginal cases? (Why?)

In other words, would you be willing to accept a certain number of infrequently used declarations in the interest of making the guidelines less subjective? If not, do you have a suggestion for a guideline replacing point 3 above that sets a higher bar for usefulness, without overly increasing subjectivity?

It would be extremely helpful if readers of this forum would try applying the proposed guidelines (with or without the clarification above) to protocols defined in their own code. Can you find examples where the guidelines do not apply well or where they lead to undesirable results?