Improving the UI of generics

dwaite · April 14, 2019, 7:39am

(Feel free to correct my statements everyone; I am after all just a self proclaimed compiler enthusiast)

This is at least partially because objective c, in the cases where it suspects it will be given a type, will inline assumptions and even the code corresponding to the call if the value given is of the expected type. This is typically where you start seeing advantages from profile-based optimizations, but even then your generic algorithms may suffer since they are called with a variety of types.

The overhead of dynamic dispatch is often not just in the pointer lookups, but in killing the ability to optimize the code across the call boundary, such as by inlining or elimination of code paths.

Jean-Daniel · April 14, 2019, 2:27pm

Objective-C never inline any method invocation.
Objective-C does not support final classes, and even if it support it in the languages, any class may be subclassed at runtime (see KeyValueObserving for instance). So, inline a method call would break languages assumptions.

Joe_Groff · April 15, 2019, 6:21pm

Sure. I think this case falls well below the threshold of a "more complex situation", though, because A, B, and Result don't have any real identity beyond "the type of a/b/the return value", and so being able to refer to them directly as such strikes me as a lower-complexity state; you have fewer names, and it is straightforward to refer to the unnamed things in terms of named ones. If there were multiple arguments or returns involving the same type, that's the point at which I think it'd become useful to name the type independently.

Similarly, referring to a protocol with some of its associated types constrained is such a common thing that it seems unfortunate to me that today it forces you into a situation where you need to introduce a where clause, which is itself a pretty big complexity step up from applying constraints to type parameters, and need to use a notation that relies on naming type parameters to describe the constraint.

Moximillian · April 17, 2019, 7:25pm

Joe_Groff:

Sure. I think this case falls well below the threshold of a "more complex situation", though, because A , B , and Result don't have any real identity beyond "the type of a / b /the return value", and so being able to refer to them directly as such strikes me as a lower-complexity state; you have fewer names, and it is straightforward to refer to the unnamed things in terms of named ones. If there were multiple arguments or returns involving the same type, that's the point at which I think it'd become useful to name the type independently.

Similarly, referring to a protocol with some of its associated types constrained is such a common thing that it seems unfortunate to me that today it forces you into a situation where you need to introduce a where clause, which is itself a pretty big complexity step up from applying constraints to type parameters, and need to use a notation that relies on naming type parameters to describe the constraint.

Yes, for simple cases shorthands are fine, a return keyword might be fine for it.

But in addition to shorthands consider those people who are not so intimately familiar with these relatively advanced concepts and syntax.

People have learned generics, it has named types, it’s maybe more similar to several other languages, and people have learned to use conformances, joined conformances and where clauses with it.

Now they try to use existentials, they are bit similar, except they don’t have names (really), and people mistake the Protocol for its name. But the where clauses are same, so some parts are familiar.

then we have some. Sure simple cases are fine. But if you go into situations similar to where clauses you’re confused again, just like with existentials.

The more I think of this It feels like fundamental flaw in swift syntax design (especially in protocols), and all the more important that both existentials any and reverse generics some should have named type syntax for ease of use, as well as, for handling complexity reasons. This is irrespective of whether there is a shorthand syntax for them.

EDIT:
And by named type syntax I refer to the one mentioned before:
any Protocol A
some Protocol S

anandabits · April 17, 2019, 7:36pm

I don't necessarily agree with this. Avoiding names requires language magic that can be avoided with sugar that makes it concise to provide names directly in the argument list. I'm not necessarily opposed to the magic if people really want it, but I don't think it's a substitute for being able to name generic parameters directly in the argument list. One of the first things I requested (via radar) when Swift was first announced was the ability to introduce and name generic parameters directly in the argument list (avoiding angle brackets). The some Collection C syntax does a good job of fulfilling that request.

Big +1 to this. The indirection necessary when constraints only apply to one argument is really unfortunate.

I agree with Joe that names should not be required. They are often not necessary at all. But when they are it should be possible to introduce them without using angle brackets. There should be progressive disclosure of complexity in the syntax that matches the progressive increase in complexity of the semantics. As has been discussed abundantly, angle brackets introduce indirection that exceeds inline syntax and is usually unnecessary.

Moximillian · April 17, 2019, 7:45pm

Angle brackets are more appropriate for shortcut syntax if anything. Updated my post with what syntax I was thinking regarding named types.

Joe_Groff · April 17, 2019, 7:46pm

I think the some/any Protocol T syntax is great, to be clear.

If we don't get all the way to path-dependent types for existentials, any P T might at least be a reasonably lightweight way of explicitly opening the type:

let x: any Collection X = returnsAnyCollection() // opens existential type as X

let first = x.first // has type X.Element?
let start = x.startIndex  // has type X.Index

steve · April 17, 2019, 10:22pm

Honestly, I think we're discussing the wrong problem with protocol generics. We agree that the type system should support generic protocol existentials without type erasure, but this discussion has been mostly centered around making their associated types existential as well—being able to declare, "I want any instances of this protocol, each with any instances of its associated types", such as with AnyHashable—whereas I think it'd be more useful and pratical to start with (what I'm going to call because I don't know the proper term) "fully qualified" associated types—being able to declare "I want any instances of this protocol, each with these specific implementations for its associated types", such as with AnySequence<T>. The latter situation is something I find myself needing much more often than the former.

All the examples described above can be more easily described with these existential protocols with universal associated types, and we can later add syntatic sugar to "simplify" the syntax to something more akin to what's being discussed.

I'm going to demonstrate this concept using the syntax Sequence«Element: T» to denote the proposed non-type-erased equivalent of AnySequence<T>. I am not advocating for the use of this syntax, I'm just using it as an arbitrary notation for my examples.

One of the first examples of a proposed generic protocol syntax is:

func bar(x: Collection, y: Collection) -> [Collection] { ... }

The problems with this syntax, it's argued, is that the Element and Index associated types are not necessarily the same for both parameters passed into bar(x:y:) as well as the return type of the function itself.

Another example is then used to show that partially constraining the protocol to a specific Element type still leaves the Index type unconstrained, leaving us with the same problem.

typealias CollectionOf<T> = Collection where Self.Element == T

func bar<T>(x: CollectionOf<T>, y: CollectionOf<T>) -> [CollectionOf<T>] { ... }

This problem can be solved by requiring existential protocols have "fully qualified" associated types:

func bar<T, U>(x: Collection«Element: T, Index: U», y: Collection«Element: T, Index: U»)
    -> Collection«Element: T, Index: U»

Here, x, y, and the return value all have the same Element and Index types. I'd also argue that Sequence would be better suited for this specific example, but I understand it's intended as a demonstration with existential associated types.

Sidenote: The way I'm imagining this system, the remaining associated types Indices, Iterator, and SubSequence inherit that "full-qualification"—resolving Indices as Collection«Element: U, Index: U», Iterator as IteratorProtocol«Element: T», and SubSequence as Sequence«Element: Element» (with its Iterator resolved to IteratorProtocol«Element: Element») respectively. I notice that this specific example recursively references Collection—causing an infinite type definition. Perhaps this sort of "qualified" type wouldn't be allowed until the compiler is able to reason that the qualified type of Indices.Indices is equivalent to the type of Indices, and can handle it somehow. I'm not sure how the type system is implemented, so maybe this won't ever be possible, but I think this behavior should at least be valid for non-recursive types like Sequence.

The second section discusses Swift's ability to allow the caller of a function to specificy its return type.

func zim<T: P>() -> T { ... }

let x: Int = zim() // T == Int chosen by caller

let y: String = zim() // T == String chosen by caller

It's argued that existential generic protocols can't allow for this behavior because the caller can't specify the Element of a returned Collection without first defining the returned collection itself.

Note: I've adjusted the function signature of the following examples to use the BinaryInteger protocol instead of the Int structure. This will let the caller better define the return type later on.

func evenValues<C: Collection, I: BinaryInteger>(in collection: C) -> Collection
    where C.Element == I, I: ExpressibleByIntegerLiteral {
        return collection.lazy.filter { $0.isMultuple(of: 2) }
}

let x = evenValues(in: [1, 2, 3, 4]) // What is type(of: x).Element?

I think it should be noted that these specific examples shouldn't technically work since lazy is defined on LazyCollectionProtocol, not Collection, so I've modified the following examples to take a LazyCollectionProtocol parameter instead.

The second example illustrates what it would look like for the caller to specify the return type themselves.

func evenValues<C: LazyCollectionProtocol, I: BinaryInteger, Output: Collection>(in collection: C) -> Output
    where C.Element == I, Output.Element == I, I: ExpressibleByIntegerLiteral {
        return collection.lazy.filter { $0.isMultiple(of: 2) }
}

let x: LazyFilterSequence<[Int]> = evenValues(in: [1, 2, 3, 4]) // Wait... Why do I need to know the return type?

The proposal for opaque result types attempts to address this issue, but it can only be resolved to a specific underlying type, which isn't the existential behavior we're looking for here.

"Fully qualified" associated types can be used here to address both problems—needing to know the return type, and having non-existential behavior for opaque result types.

func evenValues<C: LazyCollectionProtocol, I: BinaryInteger>(in collection: C) -> Collection«Element: I, Index: C.Index»
    where C.Element == I, I: ExpressibleByIntegerLiteral {
        return collection.lazy.filter { $0.isMultiple(of: 2) }
}

let x: Collection«Element: Int, Index: Int» = evenValues(in: [1, 2, 3, 4]) // Don't care what it is as long as has Ints

let y: Collection«Element: UInt, Index: Int» = evenValues(in: [5, 6, 7, 8]) // Don't care what it is as long as it has UInts

A proposed syntax related to the opaque result type is based on Rust's impl keyword.

func concatenate(a: some Collection, b: some Collection) -> some Collection { ... }

If I'm understanding the intent behind this syntax correctly, I think this would just be sugar for the following "fully qualified" example.

func concatenate<T, U>(a: Collection«Element: T, Index: U», b: Collection«Element: T, Index: U»)
    -> Collection«Element: T, Index: U» { ... }

Which can be read as, "Give me any two (possibly different) Collection implementations that both have Elements of T and Indexs of U, and then I'll return some other Collection implementation with both of those associated types as well."

@Joe_Groff, you had this to say about the need to include both the Element and Index in the existential type of Collection:

Swift's design is aimed at enabling more a expressive type system to capture more interesting type-level relationships between values. The C# design would become more cumbersome if you tried to implement something like Swift's Collection hierarchy in it, since you'd need to define a type ICollection<Index, Element> and carry the index around with you everywhere. The type relationship between collections and indexes is what allows Swift's collections to approach "zero cost" in specialized code, since for instance, you know an Array is always indexed by Ints, and that a String is always indexed by valid code unit offsets represented by String.Index. Although you could express that relationship in C#, it would make ICollection not very useful as a dynamic interface type, since the Index generic argument is usually specific to a single collection family, so for instance ICollection<Int, T> would effectively be a type that can only hold Arrays. By using associated types, Swift allows you to express relationships between Collections using only the relevant associated types; you only need to refer to Index when indexing. With more flexible existential types, you'd also be able to refer to any Collection<.Element == T> to abstract over collections of a certain element type without confining yourself to a specific index. The goal of associated types is to allow for greater flexibility and expressivity, admittedly at the cost of some shorter-term awkwardness since we're missing so many key features still.

I am of the opinion that keeping a Collection without knowing its Index excludes its use for Collection-specific functionality—which is mostly centered around the ability to index its elements. It would be more appropriate to keep a Sequence in this case. Since there wouldn't be any type-erasure, it'll be possible for some client to inspect its type and cast it back into a Collection if they really wanted to get that indexing behavior back.

I'm not saying that using a Collection without knowing its index is never appropriate—there are definitely legitimate use cases, even if I can't think of one off the top of my head. I just think that there are more use cases where not needing to know the underlying implentation of a protocol and needing it to have specific associated types is more common. This is definitely the case in my experience.

tl;dr I really think we should be focusing on creating a non-type-erased "Any*" type along the lines of AnySequence<T> before introducing non-type-erased existential protocols such as AnyHashable.

PS: If you know the formal terms for anything I talked about, let me know! Doubly so if I misunderstand some type theory concept or basic axiom of the Swift type system or philosophy.

anandabits · April 18, 2019, 12:42am

AnySequence<T> is not “fully qualified”. It erases the Iterator and SubSequence associated types. We really do need the ability to start with a fully type-erased existential and add the intended constraints to associated types. The equivalent of AnySequence would be to bind Element to a concerted type while leaving Iterator and SubSequence unspecified. The proposal (which I like) is to use the syntax any Sequence<.Element == T> to do this.

Moximillian · April 18, 2019, 9:01pm

If/when introducing any to swift protocols, it would be great opportunity to clean up .self, .Type and .Protocol in relation to protocols.

So:
(any Collection).self is the existential type ???.Type
Collection.self is the protocol type: Collection.Type

EDIT: to avoid confusion like this: Type checking inconsistency with generic metatypes

jrose · April 18, 2019, 9:02pm

I don't think we can change what MyProto.Type means. Apart from breaking source compatibility, it's also the first thing people reach for. I could see us deprecating it, though, and maybe coming up with a new syntax like (protocol Collection).Type for the protocol metadata type.

anandabits · April 18, 2019, 9:05pm

If we can deprecate it, can't we eventually eliminate it? I posted some related thoughts upthread and the following discussion with @DevAndArtist went in this general direction.

jrose · April 18, 2019, 9:17pm

Yeah, sorry, I'm ignoring the "multi-release deprecation cycle" approach to removing things for now. (Module stability is going to make source stability even more important anyway, so I'm not sure we'd ever be able to remove the old stuff from the compiler, but we could maybe get to a point where we refuse to accept it in new source files.)

Moximillian · April 18, 2019, 9:22pm

“Just” deprecating can get already quite far if the canonical terms are good. So for example compilation error messages from XCode would use the better type terms.

anandabits · April 20, 2019, 1:32am

While I was working on a library today I realized that it would be really nice to be able to define an opaque typealias in a protocol. As with opaque result types, this would allow library authors to hide concrete types in some cases where that is not possible today.

Here’s an example:

protocol P {
    asssociatedtype X
}
protocol SP {
    func foo()
}
private struct S<T: P>: SP: {
    func foo() {}
}
protocol Q {
    associatedtype A: P
    typealias B: some SP = S<A>

    // today the library must make S public
    // with opaque typealias in the protocol declaration
    // conforming types are only able to use API available on SP
    // and S does not need to be public
    func bar(_ b: B)
}

TellowKrinkle · April 20, 2019, 3:33am

From my understanding, while you wouldn't know what type the index is, you would know that you can take one and give it to the same Collection's subscript in return for an Element or give it to the collection's index(after:) method to get the index of the next element.

DevAndArtist · April 24, 2019, 12:10pm

I know that opaque types were just accepted and implemented but I already know a use case for where it needs to be nested inside a generic type like Optional.

protocol P { ... } // Note that it does not refine `AnyObject`
class MyClass {
  // cool but does not work with generics
  weak var p: (any P & AnyObject)? 

  // solution by using opaque types
  weak var p: (some P & AnyObject)?
}

Edit: Or wait, opaque types can only be used as return types which means we can't have stored properties with opaque types? @Joe_Groff did I get this wrong originally or can we make the above example possible in the future?

Also is any AnyObject on pure Swift runtime (not on Apple platforms) itself an AnyObject?

If not then we can't make extension any AnyObject: AnyObject {} which would be sad. Therefore allowing something like weak var p: (some P & AnyObject)? would be great.

jrose · April 24, 2019, 3:23pm

You definitely can't have a stored property with an opaque type unless it has an initial value, because an opaque type is fixed for the lifetime of the program, and there's no way to pick that type without that initial value.

DevAndArtist · April 24, 2019, 3:47pm

Okay fine then, but I have two questions left.

Is any AnyObject a sub-type of AnyObject in pure Swift runtime (not on apple platforms)?
Are existentials in general reference types reference types?

jrose · April 24, 2019, 5:03pm

AnyObject-the-"protocol" doesn't actually have any run-time representation. In a world with any AnyObject, there is no such type AnyObject. At the implementation level, you can think of it more like a guarantee about the type, like "fits in 64-bits" or "can be copied using memcpy", albeit one that shows up a lot more often than either of those.

In general, class-bound existentials are not compatible with AnyObject because they also carry their conformance information alongside the class pointer. @objc existentials are the exception to this since they don't use conformance information to invoke requirements.