[Pitch 2] Light-weight same-type requirement syntax

See above – this example (as well as others like a DictionaryProtocol for dictionary types) was addressed in the proposal as examples of protocols specifying multiple primary associated types. This was a great example of feedback on the pitch giving a clear and important use case the proposal didn't cater to. The solution of allowing multiple primary associated types fits very neatly into the proposal, which was amended to cover this case.

The dividing line here is between associated types that are about the essence of the protocol (the "primary" associated types... perhaps we can find a better name, though that name doesn't actually appear in the language in the current proposal so does not need to be set in stone), versus associated types that are just part of the implementation mechanics of a protocol.

The primary types are the "essential" types. Element in the case of Collection, the Value and Error types in the case of Publisher, the Key and Value types for DictionaryProtocol, the Scalar value for SIMD. One way to spot these is that they almost always match the generic placeholders for the concrete implementations. So Array<Element>: Collection, Dictionary<Key,Value>: DictionaryProtocol, SIMD3<Scalar>: SIMD.

Then there are "supporting" associated types. Types that need to vary by implementation, and need to be used in the implementation of methods. Collection.Index is the most commonly encountered one. SIMD.MaskStorage would be another. There must be an associated type to link together the type used by startIndex, endIndex, subscript and so on. But for most use cases, it can remain opaque.

My contention is constraints on these "secondary" types, which are less common but do come up when used on parameters, are going to be extremely uncommon, maybe even to the point of almost entirely unused, on opaque result types. It is counter examples to this that I am looking for.

8 Likes

I know that applying this change to standard library types is in ‘future directions’, but I have a question about whether a particular direction of evolution is feasible.

One particular usability problem I have with generics is this: collection methods often have to return a specialised type from their methods (eg various Iterator structs, Publishers.FlatMap etc in Combine, LazyPrefixWhileSequence and other lazy views). This means that if I want to understand what I can do with the return value, I need to then go to that type and look at what protocols it adopts, then also wonder if there is any additional API it offers on top of these protocols . It looks like this proposal would allow eg flatMap to return some Publisher<P, Failure>, which would be a lot clearer. Is this correct? If it is, this then leads to 2 questions

  1. Will this work from a performance point of view? My understanding is that the compiler is able to better optimise these cases because it knows the types. I’m guessing this isn’t an issue, because SwiftUI uses opaque return types, but would like to check.

  2. Is it possible to migrate from the current situation with nominal return types to opaque return types? I realise that this would be source breaking so would need to wait for Swift 6 - but is it even possible?

Could the same notion of primary (or positional/indexed) types also be used for existentials? Eg have any Collection<Int> instead of (or even replacing) AnyCollection<Int>. I think this also would be helpful in replacing a bunch of nominal types where the names actually work to obscure what is important.

3 Likes

Yes, that's correct.

Mostly yes, it's fine from a performance point of view. It's a little complicated because it's mixed in with other topics, such as whether the type has a resilient ABI (in a framework "built for library evolution") and other things like inlinability, cross-module optimization.

Basically, if the compiler has visibility into the function returning the type, then it knows what it is even if you don't, and it can optimize accordingly. So for example if you use an opaque result type in your own source code, then the compiler can see the function implementation and so knows the real returned type. If you use one returned from an ABI-stable library (like the standard library) then it depends on whether the function has been marked inlinable, allowing the caller's compiler to see what type it is. This is the fairly standard burden on ABI-stable library authors to decide how much flexibility to trade off. If the function is inlinable, this means the author cannot, later, return a different type. If it isn't inlinable, then the type is entirely opaque and has to be manipulated via the witness tables. But this is standard for ABI-stable libraries and goes along other things like whether the type is @frozen, whether other methods are inlinable etc.

For the most part, the standard library tends to mark most stuff frozen and inlinable ("fragile") because performance is critical. But higher-level frameworks tend to skew more in favor of resilience.

It is definitely possible when aligned with a Swift 6 language mode. The ABI consequences can be worked around with various techniques. Whether the source break is worth it should be discussed as part of a future pitch.

I think so, yes! A pitch of this very idea is being worked on now. Such a concept when combined with opening of existentials should indeed render AnyCollection<Int> unnecessary (though since it's ABI, we'll never actually get to delete the 2,500 lines found in ExistentialCollections.swift alas).

7 Likes

Thanks for the detailed reply - it sounds very positive!

I think the parallel between some Collection<Int> and Array<Int> is a useful one. It was previously the case that if there were angle brackets after a type name, it was a generic type. This is now muddied a little (but still visible with the presence of some). However, I think the benefits to conciseness and not needing to jump back and forth when mentally parsing a declaration is probably worth it - particularly if it’s also possible to move away from unnecessary nominal return types.

1 Like

I can't argue with that.

I'm honestly getting a little confused about opaque types in general. I might be wrong but I see the use cases of opaque types (parameters, result, structural et cetera) as a sugary lightweight alternative to generic parameters. And I support the pitch also because I find the excellent example, very well put by @Karl

to not be particularly problematic: if I ever found myself in the position to teach this, I'd do it in 2 steps, first desugar the opaque parameter, then add the additional constraint.

So, if opaque types are a lightweight sugar form of the more complete and powerful generics signature, I'm not concerned about the expressivity limitation related to the usage of the new syntax (introduced by this pitch, together with an expressivity addition) in the return position of a function, because the actual missing feature is reverse generics, and this pitch doesn't preclude future work on that, while being very useful in itself.

1 Like

A post was merged into an existing topic: Core team to form language workgroup

Holly has covered this to a significant degree, but let me try to lay out the grand vision for where this is going.

We have an opportunity for a synthesis across several different language features:

  • We'd like generics to have stronger language connections to other things so that picking up generic programming feels more familiar, with better progressive disclosure of complexity.
  • We'd like to be able to express more advanced constraints on existential types (protocol and protocol composition types) than what you can do with just &.
  • We'd like to be able to express more advanced constraints on opaque result types than what you can do with just &.

The synthesis is quite simple. The existential and opaque result type cases require us to add a syntax to constrain the associated types of a protocol or protocol composition. This should, of course, be the same syntax for these two cases, other than the leading any or some keyword. Meanwhile, SE-0341 has introduced opaque parameter types, also written with some P, allowing the generic signature to be completely elided for some simple generic functions. By adopting the same syntax for the opaque-parameter some P as for the opaque-result some P, we can now write fairly advanced generic functions without an explicit generic signature. We only need a signature when there has to be a relationship between different components of the function (like if two collections need to have the same generic element type).

It's fair to ask: why does eliding the generic signature help to achieve the goal of building stronger connections to other parts of the language? Well, Swift has three ways to generalize over different types of values. One of them is subclassing, and that's inherently a limited form of generalization: it only works when you've got classes with a common superclass. The other two are generics and existential types. SE-0341 lets you express simple generics with almost the same syntax as existential types, just varying between any and some. The vision here is to generalize that to any sort of self-contained constraint, so you can you can take that and use it uniformly throughout the language, either as any or some. That is a very strong connection. And building that connection to existential types also makes generics much more familiar to programmers coming to Swift from one of our many peer languages with weak (if any) generic systems, where they're used to working with protocol types because that's the primary tool for generalization.

So that's the vision. It's an ambitious vision, in two main ways.

First, applying this same syntax to each of these features poses different implementation complexity.

  • For generics, this is pure "sugar" — it can almost be handled in the parser — and so there is no special implementation complexity.
  • For opaque result types, I believe it's not quite so simple, but it's still fairly straightforward.
  • For existential types, there's quite a bit of plumbing and generalization that will be required in both the compiler and the runtime.

So the syntax will need to be implemented for each of these cases at different times, unless we're going to hold the whole thing up for a few releases until we've solved all the problems around generalizing existential types.

Second, generality often comes at a cost. Generic signatures are very general, but allowing for that generality makes generic declarations and constraints syntactically very different from everything else.

For example, the way that you constrain the element type of an opaque Collection type (C.Element == Int) is completely different from the way that you constrain the element type of a concrete collection type (Array<Int>). That generality means that code can also constrain the other associated types of Collection just as easily as they can Element, which is obviously necessary. However, it also creates a significant and unfamiliar gap that programmers have to overcome before they can write code that's generic over collections. So if the syntax we introduce for this looks like the contents of a where clause, we'll have achieved generality, but we'll also be missing a major opportunity to build a stronger connection to other things in the language and make generics feel more like a generalization of what programmers are already familiar with. (This is particularly true for collections, since many programmers coming from other languages are familiar with being able to write e.g. Collection<MyType>, and it seems very strange to them that Swift uses exactly this syntax but for some reason not for Collection.)

To make that concrete, it would be great for progressive complexity if you could simply take some existing function that works on a concrete generic type like Array and substitute Array out for some Collection:

func collect(widgets: Array<Item>) {
  for widget in widgets { ... }
}

func collect(widgets: some Collection<Item>) {
  for widget in widgets { ... }
}

So I think the concrete achievement of this vision has to include both:

  1. a fully general syntax that can express any constraint that a generic signature with a single type parameter could and
  2. a syntax that specifically lets you constrain associated types by equality with a given type.

This pitch only addresses (2). Procedurally, I think it's okay for a proposal to carve out a narrow case like that, in the interests of making incremental progress, as long as it doesn't prevent the more general case from being addressed. I don't think that's happening here. I don't know that developing the general syntax is hard in the way that it's been described a few times in this thread, but if both syntaxes are indeed necessary, it's fine to start with the narrower one that has greater immediate impact on the standard library.

I was more worried about this narrower syntax being inadequate in the short term even for common cases until @Joe_Groff helped me realize just how expressive nested some types could be. For example, some Collection<some Comparable> is a perfectly fine way of expressing what otherwise would have been <C: Collection> where C.Element: Comparable.

26 Likes

Moderator note: I've moved Chris's post to the language workgroups thread, which I believe is where he wanted to post it, but couldn't because it was locked. Please take any discussion of that over there.

8 Likes

An attempt for a potential middle ground. I strongly remain in the position that the <...> should be preserved for parameterized protocols as described in the generics menefesto. That is the reason why I think the golden middle should use a 'marker' on the associated types.

// today

// the order of those associated types becomes important regardless
// if we use a marker or shift everything into angle brakets
// during the protocol declaration
protocol P {
  @marker
  associatedtype A

  @marker
  associatedtype B

  associatedtype C
}

// no primary associated types constrained
some P
// both marked associated are moved up into the angle brackets
some P<ConcreteA, ConcreteB>
// only A is constrained 
some P<ConcreteA, _>
// only B is constrained 
some P<_, ConcreteB>

// `C` is not constrainable with this syntax and would require a different syntax
// e.g.
func foo() -> <T: P> T where T.C == ConcreteC

In the future when we have the chance to revisit 'actual generic protocols' we can still combine those features into the same generic type parameter list like so:

// in the future

protocol Q<A> {
  @marker
  associatedtype B

  @marker
  associatedtype C

  associatedtype D = A
}

// the actual generic type parameter comes first and followed by 
// marked associated types
some Q<ConcreteA, ConcreteB>

// primary associated type unconstrained
some Q<ConcreteA, _, _>
// or if none primary associated types need to be constrained,
// we'd simply slice off tail of that list and only set the necessary
// generic type parameters
some Q<ConcreteA>

func bar() -> <T: Q<ConcreteA>> T where T.D == ConcreteD
  1. That way on the declaration side we won't mix the generic type parameter with the primary associated types.
  2. Tradeoffs we have to take:
    • declaration order of primary associated types becomes important
    • we will have to use placeholder types in cases we don't want to constraint a primary associated type
    • the primary associated type cannot be visible exposed without the generalized feature
    • the ability to specify a conformance of an associated type might require the generalized feature in some cases
  3. I do believe that it would be impossible to extend protocols in such manner if the current proposal would move primary associated types into the angle brackets on the declaration side of things. It would require some disambiguation between true generic type parameter and primary associated types and I think we could agree that R</* explicitly */ generic A, /* implicitly primary */ B> or R<A, /* explicitly */ primary B> markers inside the angle brackets wouldn't be ideal especially as we may want to introduce type labels in the future.

Would that be feasible?

3 Likes

IMO this is more confusing than overloading the meaning of <> because you lose the symmetry in the source between declaration and use-site.

2 Likes

I was very concerned with the use of <…> in the first version of this pitch because I was of the same opinion. However, as I said before, I am much happier with this iteration precisely because it doesn’t go half way.

Either the angle brackets should be reserved for “generic protocols”—even though the Manifesto has stated is unlikely ever to be a part of Swift—and the angle brackets should not be used at either the declaration or use site for anything else; or they should not be reserved for such, and they should be declared similarly to how generic parameters are declared since they are used similarly as well.

I would be sad to see this design revert to the earlier inconsistency. Put another way, I would be concerned that a proposal to use angle brackets for protocols didn’t either adopt them for generic protocols or rule out their use for generic protocols; the idea that they might mean one or the other at some later point is to me the least ideal state of affairs.

5 Likes

I would argue that symmetry shouldn't be the primary argument for burning a possible future feature. The main motivation for the proposed syntax in my opinion remains on the use-site and I just tried to present a possible middle ground which in my non-compiler engineer eyes would be a fair trade-off for both parties, the one who want generic protocols and those who want to use primary associated types inside the angle brackets without explicit associated type names.

That said, I probably can somewhat accept the assymetic feature design:

protocol P {
  primary associatedtype A
}

some P // equals `some P<_>`?
some P<ConcreteA>

If you want to keep room for generic protocols with the standard brackets syntax, I would rather have it so you need to write something like Collection<associatedtype Element> to get the light-weight form for the feature in this pitch. How does that look to you?

1 Like

@bzamayo I do not follow. Honest question: How is this an improvement or any kind of disambiguation?

In my mind I could view multiple levels of sugar code:

// no sugar code here at all, we only introduce a single 'primary' marker 
protocol P<A> {
          ^~~ generic type parameter list, not primary assoc

  primary associatedtype B

  // non-primary
  associatedtype C
}

We now have to decide whether or not primary associated types must be always specified or not:

// less light weight (not proposed)
some P<ConcreteA, .B == ConcreteB>

// light-weight: could be sugar for above example
some P<ConcreteA, ConcreteB>

// * is this `some P<ConcreteA, _>` ???
// * or if the first example was a thing: `some P<ConcreteA, .B == _>` 
some P<ConcreteA> 
      ^~~~~~~~~~~ required generic type parameter

Because in the future when generic protocols existed, you wouldn't use the 'associatedtype' prefix to declare a generic parameter.

I still don't follow. Why do you think so? In my examples the generic type parameter isn't representing the primary associated type, like it's being proposed. Associated types (wether primary or not) and a generic type parameter on the protocol can both coexist and be used simultaneously as they are not just substitutions of one another.

Strawman code:

protocol P<A> {
  associatedtype B
}

struct S {}

extension S: P<X> {
  typealias P<X>.B = Foo
}

extension S: P<Y> {
  typealias P<Y>.B = Bar
}

func foo<T, R: P<T>>(_: T.Type, _: R.Type) {
  print(R.B.self)
}

foo(X.self, S.self)
foo(Y.self, S.self)

I feel like we are talking past each other. What I was trying to suggest was an alt-syntax that still housed the primary associated types inside the angle brackets. I thought one of your issues was that would become ambiguous if generic protocols existed in the future, so I amended by saying we could prefix the associatedtype declarations with the 'associatedtype' keyword inside the angle brackets.

So rather than (quoting your example):

protocol P<A> {
          ^~~ generic type parameter list, not primary assoc

  primary associatedtype B

  // non-primary
  associatedtype C
}

You would hypothetically declare it as

protocol P<A, associatedtype B> {

  // non-primary
  associatedtype C
}
1 Like

Ah, that makes more sense now. While I understand what you mean, I don't think this is a good approach unless we would start explicitly annotating the primary associated types inside the declaration angle brackets today.

protocol P<associatedtype A>

Unless we come up with another short form for that, I don't think this is the syntax we all want to see as there will be several protocols with multiple primary associated types with potentially longer names.

protocol SomeProtocol<associatedtype SomeName, associatedtype SomeOtherName>

Requiring to write associatedtype before each primary associated type inside the angle brackets or just having a single associatedtype in the list feels strange to me. Also one more thing. If protocols had a generic type parameter, we may want to apply it as a default type on a potentially primary associated type.

The version with a marker seems to be more straight forward:

protocol P<A> {
  primary associatedtype B = A
}

extension SomeType: P<X> {} // SomeType.P<X>.B == X
extension SomeType: P<Y> {
  typealias P<Y>.B = Foo // default overriden 
}
exntesion SomeType: P<Z, Bar> {} // SomeType.P<Z>.B == Bar

Generally there is a strong expectation that declaration order inside type definitions doesn't matter. There are exceptions (like @frozen structs) but those are special cases most users aren't aware of.

This is the main reason why we don't have comparable synthesis for structs, but do for enums where ordering matters in other ways (raw values and case iteration).

On the other hand, it's well understood that the order of the generic placeholders on a type has meaning, and this understanding should transfer directly to primary associated types.

1 Like

Wouldn't that still apply to my suggestion though? We can document and teach that the order of primary associated type exposure is reflected by their top down declaration order on the sugared use-side?

// proposed
protocol DictionaryProtocol<Key: Hashable, Value> {
  ...
}

// my suggestion that might be less discoverable at a glance
// but it keeps some known syntax integrity and future extensibility intact 
protocol DictionaryProtocol {
  // declaration order for `primary associatedtype` needs to be documented
  // as it becomes important, however it's only important for the 
  // sugared use-side only
  primary associatedtype Key: Hashable
  primary associatedtype Value  
  ...
}

// nothing changes for the use-side
some DictionaryProtocol<SomeKey, SomeValue> 

It does feel like the primary markers would only enable a few compiler checks:

  • if there is a primary marker, allow the sugared syntax P<Something>
  • link the order from the generic type parameter list with every primary associated types in their restrictive order

By the end of the day, we're talking only about sugar syntax for a same-type constraint. The general feature as John previously mentioned still does not rely on any kind of order of associated types (primary or not).