[Pitch] Light-weight same-type constraint syntax

Tino · November 4, 2021, 8:48pm

But exactly this won't be possible with the pitched change (will it??)

However, with generic protocols…

protocol AnyIterator<Element> {
  // no associated objects, but
  mutating func next() -> Self.Element?
}

protocol AnyCollection<Element> {
   // no associated objects, but maybe some methods that need no parameters but Element
  func makeIterator() -> AnyIterator<Element>
}

extension Iterator: AnyIterator<Element> {} // hey, that's easy - everything is already there!
extension Collection: AnyCollection<Element> {}

would be good enough for me — especially when you consider

protocol CollectionWithIndex<Index> { // too bad we can't have named parameters for generics :-/
    // Collection-requirements…
}

extension Collection: CollectionWithIndex<Index> {}

protocol CollectionWithElementAndIndex<Element, Index>: AnyCollection<Element>, CollectionWithIndex<Index> {}

var list: CollectionWithIndex<Int, Int> = [3, 1, 4, 1, 5, 9, 2]
// do some collection-stuff
list = someOtherCollectionOfInts

Karl · November 4, 2021, 8:54pm

If we did have a shorthand for protocol constraints, I think it's very important that it handle subtype constraints, too. I don't think they can be separated; same-type constraints alone aren't work the radical new syntax.

IMO, same-type constraints often aren't what you want, especially as a beginner to generics. Imagine I have some algorithm which starts as operating on an array of strings:

func frobinate(strings: [String])

Then one day I learn about Swift's cool lazy collection views, but this code doesn't work with them. So I try to make it generic to any Collection, using this obvious syntax that we've decided to make so lightweight:

func frobnicate<C: Collection<String>>(strings: C)

// or:
// func frobinate(string: some Collection<String>)

OK - that works. Later, I realise that all of these Strings live as separate heap allocations with their own lifetimes, and creating all of those strings is slowing my App down. I used String when I wrote this function, because that's the default text type people should reach for -- but it could work just as well with Substring.

Swift has a protocol to make that kind of processing easier - StringProtocol. So I change my function again:

func frobnicate<C: Collection<StringProtocol>>(strings: C)

// or:
// func frobinate(string: some Collection<StringProtocol>)

Now something very interesting happens - my function will work for Arrays of strings and substrings, but not for other collections

protocol MyStringProtocol {}
extension String: MyStringProtocol {}
extension Substring: MyStringProtocol {}

func frobinate<C>(
  strings: C
) where C: Collection, C.Element == MyStringProtocol {}

let arrayOfStrings: [String] = ["hello"]
frobinate(strings: arrayOfStrings) // OK

let lazyCollectionOfStrings = (0..<10).lazy.map { String($0) }
frobinate(strings: lazyCollectionOfStrings) // ERROR - 'String' and 'MyStringProtocol' must be equivalent

See the problem? Collection<StringProtocol> means a collection of existentials - and while Array has special implicit conversions, other collections don't. What the developer actually wanted to write was Collection where Element: StringProtocol - with a subtype constraint for the element type, not a same-type constraint.

With SE-0306, all protocols will be usable as existentials, so things like Collection<StringProtocol>, Collection<Numeric>, etc. would be valid. That adds a whole new dimension to the problem, IMO.

I think that observation applies generally - when a subtype constraint will do what you want, it's generally preferable to a same-type constraint. They definitely do have uses (Collection where Element == UInt8 is very important for working with sources of bytes, for example), but the whole point of generic programming is to loosen your algorithms to be based on semantics and capabilities rather than specific types.

So what I'm saying is: I don't think same-type constraints alone are worth this fuss.

filip-sakel · November 5, 2021, 4:21pm

Are you proposing that <T: Collection<Numeric>> be interpreted as <T: Collection> where T.Element: Numeric?

I think you're referring to 309.

xedin · November 5, 2021, 4:33pm

Note that the pitch proposes that in requirement positions, like you mentioned, the new syntax is a shortcut for a long version with associated type name in a where clause, so <T: Collection<String>> would indeed be <T: Collection> where Collection.Element == String we could support more bound kinds too if that is desirable… Although the idea here is to unify the angle branches between protocols and generics which only support same-type constraints.

jayton · November 5, 2021, 5:46pm

I think this is more evidence that you’re trying to unify things that are actually more different than they appear, and supports my contention that this will make it harder to go from beginner generic programmer to actually understanding how stuff works.

xedin · November 5, 2021, 5:50pm

I'm not sure what you mean by more evidence. We did unify the behavior in the pitch with generic parameters in the where clause e.g.

protocol P {
}

class A<T: P> {
}

func test<T: A<P>>(_: T) {}

Doesn't work either because P is considered a same-type constraint in <T: A<P>> context.

Karl · November 5, 2021, 9:51pm

I do think it makes sense, that since T is a single type conforming to collection, and associated types can only be bound to a single type per conformance, a reasonable interpretation of something like T: Collection<Numeric> would be "a Collection whose element type conforms to Numeric".

(As opposed to: A collection of existential any Numerics, a type which doesn't even conform to Numeric!)

So I wouldn't blame developers who get confused by this. We can all agree that the generics system needs to be more intuitive, and whilst this shorthand will help abbreviate some constraints, I don't think it will actually make the system simpler or easier to learn.

As soon as you need a subtype constraint (which IMO you should prefer whenever possible), you'll face an even higher wall than you did before, because we so heavily optimised syntax for the other thing (does that sound familiar?). If we had a more balanced approach to constraint shorthands, it would be simpler to see what is happening and why changes have the effect they do.

// Bonus - subtype constraints are even easier than same-type.
func frobnicate(strings: Collection<.Element: StringProtocol>)
func frobnicate(strings: Collection<.Element == StringProtocol>)

It might also be worth considering going the other way - allowing subtype constraints to be used in more places, so people can just write where Element: Int or where Element: String, to allow those constraints to be more easily relaxed to protocol constraints.

It may not work for classes, but at least when the constraint is a value type, we can be relaxed and just DWIM.

// The compiler knows what you mean, can't it just be chill?
func frobnicate(strings: Collection<.Element: String>)

xwu · November 5, 2021, 10:47pm

Hear hear.

Why can’t we just write everywhere what we already do in extension declarations: Collection where Element: StringProtocol?

jrose · November 5, 2021, 10:59pm

It's ambiguous in return types:

func frobnicate<Element>(_ x: Element) -> Collection where Element: StringProtocol

Even if it were accepted one way or the other, the compiler would still have to handle people writing the "wrong" way and nudge them toward the right one.

xwu · November 6, 2021, 1:24am

jrose:

It's ambiguous in return types:
func frobnicate<Element>(_ x: Element) -> Collection where Element: StringProtocol
Even if it were accepted one way or the other, the compiler would still have to handle people writing the "wrong" way and nudge them toward the right one.

The example is already accepted unambiguously to refer to the generic parameter, unless I’ve entirely lost it.

For which reason one would naturally expect—or, at least, I would naturally expect—that parens would be required to refer to the associated type if both were to use where notation:

func frobnicate<Element>(
  _ x: Element
) -> (Collection where Element: StringProtocol)

Might be subtle, but such is the role of parens generally that I don’t think it’d be shocking to anyone.

DevAndArtist · November 6, 2021, 10:01am

In a generic world the last part of your examples would be completely valid. In fact, this falls under the umbrella of the 'generalized supertype constraint' which is also part of the generics manifesto. Not to mention that Never will be a reasonable sub-type of any value type which further justifies the where Element: String supertype constraint.

This is also one of the reasons I previously mentioned

DevAndArtist:

While this discussion focussed is only on same-type constraint . Can I ask why we shouldn't ever explore a sub-type constraint on the use-site?

Bikeshedding code:
// It's not important what concrete type `.Element` would have,
// it's only important that it conforms to `Foo`
_: some Collection<.Element: Foo, .Index == Int>

why I personally would prefer a more flexible and unumbigious syntax such as:

Karl:

// Bonus - subtype constraints are even easier than same-type.
func frobnicate(strings: Collection<.Element: StringProtocol>)
func frobnicate(strings: Collection<.Element == StringProtocol>)

Karl · November 7, 2021, 1:11am

It might be worth sketching out another idea I had for how we could make generic functions a bit easier to read and write, since it also involves repurposing angle brackets (albeit in a different way).

I mentioned in the other thread how I'd like us to unify existentials and generics in contexts where they are the same thing, and one of those contexts is using an existential function parameter as an anonymous generic type:

// Today:
func frobnicate<C>(strings: C) where C: Collection, C.Element: StringProtocol

// If unified with existentials, this would be the same as:
func frobnicate(strings: Collection where Element: StringProtocol)

But the other thing I think is quite interesting is reintroducing angle brackets within the parameters for named generic types:

// How do we simplify this?
func frobnicate<C>(
  strings: C,
  from index: C.Index
) where C: Collection, C.Element: StringProtocol

// Bring the generic signature in-line:
func frobnicate(
  strings: <C: Collection where Element: StringProtocol>,
  from index: C.Index
)

Currently, when you read a generic function, you see (in order)

func {function-name}<X, Y, Z>(
  names: X, positions: Y, colors: Z
) -> Int where
X: Collection, X.Element == String,
Y: Collection, Y.Element == Coordinate,
Z: Collection, Z.Element == Color

func. Simple enough.
{function-name}. A short description of the function, chosen to make sense at the call-site.
A prelude <X, Y, Z>, where the function defines some placeholder types called X, Y, and Z which it will use later in some way. These are usually poor quality names, because they don't have any meaning besides what they represent in the signature to follow.
The function arguments - with descriptive names (strings, names, colors), but illegible type information (C, X, Z, etc). We can likely guess from the argument names what these generic types will involve, but that information is still not written out for us yet.
The function's return type (if any)
The list of constraints which finally tell us what X, Y, and Z actually are. Flick back and forth between the function signature and the constraints a few times while you match them up. Also, note that we need to keep repeating the names X, Y, and Z, because they share a single constraints list (X: Collection, X.Element == String, etc).

When you step back and look at it, we chop up the information about generic types and scatter it about the function signature like we're almost trying to hide it. Deciphering a generic function is a lot more convoluted than your typical, non-generic function, and information locality is a big reason for it, IMO.

When I see a function argument with a name like strings: or names:, that is usually immediately followed by a colon, then the argument's type. If the examples above were not generic functions, I would seeing nice, legible signatures such as:

func frobnicate(strings: [String])

func plotMarkers(names: [String], positions: [Coordinate], colors: [Color])

I can read those in a single left-to-right pass with no backtracking.

If we brought this same approach to generics, I think it could make even complex signatures much simpler:

// Non-generic, for comparison.
func plotMarkers(
  names: [String],
  positions: [Coordinate],
  colors: [Color]
) -> Int

// Today:
func plotMarkers<Names, Positions, Colors>(
  names: Names, positions: Positions, colors: Colors
) -> Int where
Names: Collection, Names.Element == String,
Positions: Collection, Positions.Element == Coordinate,
Colors: Collection, Colors.Element == Color

// Possible future:
// Level 1 - anonymous generic parameters.
func plotMarkers(
  names: Collection where Element == String,
  positions: Collection where Element == Coordinate,
  colors: Collection where Element == Color
) -> Int

// Possible future:
// Level 2 - named generic parameters.
func plotMarkers(
  names: <Names: Collection where Element == String>,
  positions: <Positions: Collection where Element == Coordinate>,
  colors: <Colors: Collection where Element == Color>
) -> Int

Of course, I'm biased, but I think this is perhaps a more promising future direction for angle brackets. The reason I bring it up with this pitch is that, if we used angle brackets for same-type constraints as well, I think this idea would be significantly less appealing.

filip-sakel · November 7, 2021, 9:23am

Karl:

func plotMarkers(
  names: <Names: Collection where Element == String>,
  positions: <Positions: Collection where Element == Coordinate>,
  colors: <Colors: Collection where Element == Color>
) -> Int

I don’t think this direction will be intuitive. Named generic parameters are named so that they can be referred to in constraints with other generic parameters or types. Thus, placing them on a specific function parameter, while not exclusively targeting that parameter, would be misleading.

Two notes on this:

I think the bare collection name should be prepended with some, to avoid the existentials-vs-generics ambiguity.

Requiring that Collection. Element be referred to as .Element would be more consistent and would allow otherwise ambiguous uses:

func append<Index>(
  // 'Index' would be ambiguous despite '.Index'
  a: some Collection where .Index == Index, 
  b: some Collection where .Index == Index
) -> (some Collection where .Index == Index)

Going back to concerns regarding the actual pitch:

Not only does this create consistency problems as demonstrated by @xedin's example; it also doesn't address the underlying problem. Existentials are constantly being confused with generics. Many beginners don't understand the difference between func f(_: Numeric) and f<T: Numeric>(_: T) — I too was oblivious to it for a long time. Even now, people unknowingly refer to Collection or arithmetic-protocol existentials, unknowingly incurring significant performance penalties. All that's to say that this issue deserves a holistic solution. Introducing a limited solution will not resolve the underlying issue and will only introduce inconsistencies, making Swift less comprehensible.

Jumhyn · November 7, 2021, 3:11pm

Without straying too far from the pitch at hand, I just want to say I agree wholeheartedly with this sentiment. I think I have a pretty strong mental model that separates each generic parameter's 'primary' constraints (those that don't reference any other generic parameters) from its 'secondary' constraints (those that reference other generic parameters). IMO, it would be great to be able to specify the 'primary' constraints up front to avoid the back-and-forth kind of reading that you note.

Do we have an idea of how common these problematic uses of existentials are? I have been under the impression for some time that many uses of existentials are isomorphic to the equivalent generic signature, and shouldn't incur any performance or optimization penalty.

Tino · November 7, 2021, 4:56pm

Karl:

func {function-name}<X, Y, Z>(
  names: X, positions: Y, colors: Z
) -> Int where
X: Collection, X.Element == String,
Y: Collection, Y.Element == Coordinate,
Z: Collection, Z.Element == Color
func . Simple enough.

{function-name} . A short description of the function, chosen to make sense at the call-site.

A prelude <X, Y, Z> , where the function defines some placeholder types called X, Y, and Z which it will use later in some way. These are usually poor quality names, because they don't have any meaning besides what they represent in the signature to follow.

Don't know if it's intentional ;-), but I think when you want to avoid scattering, the example for the status quo should be written as

func {function-name}<Names: Collection, Positions: Collection, Colors: Collection>
(names: Names, positions: Positions, colors: Colors) -> Int where Names.Element == String, Coordinates.Element == Coordinate, Colors.Element == Color

(there's also nothing that enforces C++-style parameter names).

NSExceptional · November 7, 2021, 7:27pm

Is there any reason we can't have both syntaxes? All the benefits of this proposal don't seem to clash with the idea of using leading dot syntax to specialize specific associated types. I don't see why we can't have all of these:

extension Collection<String> { }
extension Collection<.Index == Int> { }
extension Collection<.Index == Int, .Element == String> { }

I like the brevity that comes with being able to specialize primary associated types in angled brackets, and the intent conveyed in code by denoting an associated type as primary. But I also value being able to constrain any associated type, so I would really like to be able to use both of these syntaxes.

xedin · November 9, 2021, 7:57pm

We didn't include it in the pitch because the primary aim is not to increase the expressiveness of the language but to unify syntax and aid with progressive disclosure. This is something that I have suggested some time ago in this thread as well, at least I am open to the idea if we'd make that consistent across protocols and generic types, not sure how others feel about it.

hborla · November 9, 2021, 8:37pm

To frame this in a slightly different way, this pitch is not intended to be a fully generalized mechanism for constraining associated types. The specific goal is to provide folks with a stepping stone toward the fully generalized notation, which can be really hard for novice programmers to grasp. My personal opinion is that allowing fully generalized constraints in angle brackets, e.g. <.Index == Int> or <.Element: SomeProtocol>, does not actually achieve this goal. At that point, I think it makes sense for the programmer to learn about the where clause and write such constraints there.

hooman · November 9, 2021, 11:55pm

(Emphasis mine)

I fully agree with this. This same type constraint is a shorthand notation to let us introduce novice developers to some very useful subset of generics without going into details early. It is not a replacement for the existing full syntax and should not parallel it.

filip-sakel · November 13, 2021, 8:50am

You’re right. A lot of existentials, especially pre-SE-309, should be isomorphic to generics. However, I don’t know how much the compiler can optimize more complex protocols with associated types, like Collection. I imagine boxing arithmetic types won't be that efficient either.

Nevertheless, my point is that yes, users being able to tell apart generics from existentials is important. But it requires a much larger effort. Hence, we shouldn’t hold back this pitch with the tangible, more impactful benefit of unifying the generics syntax between concrete types and protocol constraints.