How to do dynamic dispatch?

CTMacUser · September 3, 2018, 2:49am

I tried these in a playground:

/**
 Returns the endIndex of the shorter of the two given collections, and the corresponding index for the longer.

 - Parameter first: The first collection to be measured.
 - Parameter second: The second collection to be measured.

 - Returns: `(first.index(first.startIndex, offsetBy: x), second.index(second.startIndex, offsetBy: x))`, where *x* is defined as `min(first.count, second.count)`.

 - Complexity: O(1) since both `first` and `second` conform to `RandomAccessCollection`.
 */
fileprivate func lastCorrespondingIndexes<T: RandomAccessCollection, U: RandomAccessCollection>(_ first: T, _ second: U) -> (T.Index, U.Index) {
    print("Dispatch to RandomAccessCollection lastCorrespondingIndexes")
    let shortCount = min(first.count, second.count)
    return (first.index(first.startIndex, offsetBy: shortCount), second.index(second.startIndex, offsetBy: shortCount))
}

/**
 Returns the endIndex of the shorter of the two given collections, and the corresponding index for the longer.

 - Parameter first: The first collection to be measured.
 - Parameter second: The second collection to be measured.

 - Returns: `(first.index(first.startIndex, offsetBy: x), second.index(second.startIndex, offsetBy: x))`, where *x* is defined as `min(first.count, second.count)`.

 - Complexity: O(*m*) where *m* is the length of the shorter collection.
 */
fileprivate func lastCorrespondingIndexes<T: Collection, U: Collection>(_ first: T, _ second: U) -> (T.Index, U.Index) {
    print("Dispatch to general Collection lastCorrespondingIndexes")
    guard let lastCommonNonEndIndexes = Array(zip(first.indices, second.indices).suffix(1)).last else {
        // At least one collection was empty.
        return (first.index(first.startIndex, offsetBy: +1, limitedBy: first.endIndex) ?? first.endIndex, second.index(second.startIndex, offsetBy: +1, limitedBy: second.endIndex) ?? second.endIndex)
    }

    // At least one of the returned elements will be its collection's endIndex.
    return (first.index(after: lastCommonNonEndIndexes.0), second.index(after: lastCommonNonEndIndexes.1))
}

And put when I tried dispatching them within a struct initializer:

/// A collection of ordered pairs from partially-optional elements from two given sub-collections.
public struct ZipLongest2CollectionToo<FirstBase: Collection, SecondBase: Collection> {

    /// Indicator of which collection is longer, and where the uncorresponding suffix of the longer collection begins.
    enum SuffixMarker {
        /// The first collection is longer.
        case longerFirst(FirstBase.Index)
        /// The second collection is longer.
        case longerSecond(SecondBase.Index)
        /// The collections are equal in length.
        case equalLength
    }

    /// The first sub-collection.
    let base1: FirstBase
    /// The second sub-collection.
    let base2: SecondBase
    /// Where the uncorresponding suffix of the longer collection begins, or double `nil` if the collections are the same length.
    let suffixMarker: SuffixMarker

    /**
     Creates a collection vending pairs of elments from two given sub-collections.

     The elements are `Optional`s of each collections' `Element` type, so when we access beyond the length of the shorter sub-collection, `nil` is returned in that place instead of choking.

     - Parameter first: The collection that is the source of the first part of returned elements.
     - Parameter second: The collection that is the source of the second part of returned elements.
     */
    public init(_ first: FirstBase, _ second: SecondBase) {
        base1 = first
        base2 = second

        let (firstCommonEnd, secondCommonEnd) = lastCorrespondingIndexes(base1, base2)
        switch (firstCommonEnd == base1.endIndex, secondCommonEnd == base2.endIndex) {
        case (true, false):
            suffixMarker = .longerSecond(secondCommonEnd)
        case (false, true):
            suffixMarker = .longerFirst(firstCommonEnd)
        case (true, true):
            suffixMarker = .equalLength
        default:
            preconditionFailure("Can't be reached, since it means that lastCorrespondingIndexes(_: _:) didn't finish")
        }
    }

}

The general version of lastCorrespondingIndexes is always called, even when the collections both conform to RandomAccessCollection. Is this because the struct specifies only Collection? Or is there something else I'm missing?

SDGGiesbrecht · September 3, 2018, 3:24am

Yes. The compiler selects the most specific satisfiable generic function out of a set of overloads at compile time.

If you want to do dynamic dispatch, use a protocol. Protocol requirements are dispatched dynamically:

protocol OptimizedZippable : Collection {
    func lastCorrespondingIndexes(_ other: T) -> (Self.Index, T.Index) where T : Collection
    func lastCorrespondingIndexes(_ other: T) -> (Self.Index, T.Index) where T : RandomAccessCollection
}
extension OptimizedZippable { // (where Self : Collection)
    func lastCorrespondingIndexes(_ other: T) -> (Self.Index, T.Index) where T : Collection { ... }
    func lastCorrespondingIndexes(_ other: T) -> (Self.Index, T.Index) where T : RandomAccessCollection { ... }
}
extension OptimizedZippable where Self : RandomAccessCollection {
    func lastCorrespondingIndexes(_ other: T) -> (Self.Index, T.Index) where T : Collection { ... }
    func lastCorrespondingIndexes(_ other: T) -> (Self.Index, T.Index) where T : RandomAccessCollection { ... }
}

public struct ZipLongest2CollectionToo<FirstBase: OptimizedZippable, SecondBase: OptimizedZippable > {
    ...
}

extension Array : OptimizedZippable {}
extension ArraySlice : OptimizedZippable {}
extension ContiguousArray : OptimizedZippable {}
extension String : OptimizedZippable {}
extension Substring : OptimizedZippable {}
extension AnyCollection : OptimizedZippable {}
...

anthonylatsis · September 3, 2018, 3:24am

As far as I know, the compiler will statically resolve to the overload with the best-matching signature. And that is T: Collection in your case.

A simple example

func foo<T: RandomAccessCollection>(_ arg: T) { print("RandomAccessCollection") }

func foo<T: Collection>(_ arg: T) { print("Collection") }

func doo<T: Collection>(_ arg: T) {
  foo(arg)
}

doo([1, 2, 3])
doo("44444444")

foo("44444444")
foo([1, 2, 3])

// Collection
// Collection
// Collection
// RandomAccessCollection

Here's a very similar case, by the way:

SDGGiesbrecht · September 3, 2018, 3:37am

The link @anthonylatsis posted has much more detail.

If only you could have posted a few seconds earlier, @anthonylatsis... :)

CTMacUser · September 4, 2018, 4:08am

So is there a fix? Since the struct is supposed to take arbitrary collections, making a special protocol and priming every potential type in advance is immediately a non-starter. Is there some way to take advantage of class hierarchies?

The point of the function pair is to allow O(1) for a pair of random-access collections and O(n) otherwise. The RAC algorithm would be O(2 * shorter + longer) if the collections weren't both random-access instead of O(shorter) for the general algorithm.

anthonylatsis · September 4, 2018, 5:18am

** Sigh ** I was really hoping this would work. The idea was to overload the initializer so we control which overload of lastCorrespondingIndexes is picked with the signature. At least I found an interesting bug. @Slava_Pestov

struct Foo<U: Collection> {
  init<T: Collection>(_ arg: T) where T == U { ... }
  init<T: RandomAccessCollection>(_ arg: T) where T == U { ... }
}

// Trying to hack around T == U
extension Collection { typealias KEK = Self }

struct Foo<U: Collection> {
  init<T: Collection>(_ arg: U) where T.KEK == U.KEK { ... } 
  init<T: RandomAccessCollection>(_ arg: T) where T.KEK == U.KEK { ... }
// 'T' does not have a member type named 'KEK'; did you mean 'KEK'?
// 'U' does not have a member type named 'KEK'; did you mean 'KEK'?
}

It would be nice to have same-type constraints allowed between generic parameters coming from different signatures.

SDGGiesbrecht · September 4, 2018, 7:20am

I don’t understand what that is supposed to do.

In the first initializer, T is never used in the signature, init(_ arg: U), so it references nothing.

In the second initializer, the generic member U does not appear in initializer signature, init(_ arg: T) so the compiler has no way of knowing what concrete type it is supposed to be initializing. (i.e. it only knows Foo<_>, when it needs to know Foo<[Int]>, Foo<String>, or some other concrete type.)

Are you fishing for one of these?:

struct Foo<U> where U : Collection {
    init(_ arg: U) { ... }
}
extension Foo where U : RandomAccessCollection {
    init(_ arg: U) { ... }
}

struct Foo<T, U> where T : Collection, U : Collection {
    init(_ t: T, _ u: U) { ... }
}
extension Foo where T : RandomAccessCollection, U : RandomAccessCollection {
    init(_ t: T, _ u: U) { ... }
}

This will allow outside code to reach the optimized implementation, but it is still not dynamically dispatched. The optimized overload will only be used when the call site of the initializer knows the member conforms to RandomAccessCollection.

_ = Foo("") // Unoptimized. (Not RandomAccessCollection.)
_ = Foo([Character]()) // Optimized. (RandomAccessCollection.)

func opaque<V>(_ arg: V) where V : Collection {
    _ = Foo(arg)
}

opaque("") // Unoptimized. (Not RandomAccessCollection.)
opaque([Character]()) // Unoptimized. (Despite `RandomAccessCollection`)
// The optimization is missed because `opaque` only operates on `Collection`.

anthonylatsis · September 4, 2018, 7:47am

That's a typo, it was supposed to be T.

It actually does appear, implicitly. The uncurried signature would be <U: Collection, T: RandomAccessCollection where T == U>(_ arg: T) -> Foo<U>. The same-type constraint resolves the type to be initialized, which is Foo<T> or Foo<U> (U == T). The problem is that same-type constraints between generic parameters aren't supported whatsoever. Other constraints are fine:

class Foo<T> {
  func foo<U: Collection>(_ arg: U) where T: Numeric, T == U.Element { ... }
}

As I mentioned, it would be nice to have these same-constraints allowed and thus supported by the compiler. What it's supposed to do then is allow to differently constrain a generic parameter for specific input while still binding it to an external generic parameter. That's exactly what I wanted by trying to hack around it with KEK.

Slava_Pestov · September 4, 2018, 8:26pm

That's not the issue here -- it looks like we don't consider typealiases in extensions when resolving same-type constraints. cc @Douglas_Gregor

anthonylatsis · September 4, 2018, 8:42pm

It seems so, but not always. It works with a custom protocol. The standard library is precompiled or dynamically linked, right? Maybe it has something to do with that. I'll file a bug in a moment.

protocol P {
  associatedtype A: Collection
}
extension P {
  typealias R = A.Element
}
struct A<T: P> {
  func foo<U: Collection>(_ arg: U) where U.Element == T.R {}
}

P.S. I understand T.Assoc1 == U.Assoc2 is fine, what I meant was same-type constraints directly between generic parameters in some cases.

Edit – [SR-8692] Type aliases in stdlib protocol extensions not considered in generic constraints · Issue #51205 · apple/swift · GitHub

SDGGiesbrecht · September 4, 2018, 9:27pm

I see what you mean now, @anthonylatsis. I concur that it should not trip the compiler.

Aren’t these really the same thing, then, just written with the constraint in a different place?:

struct Foo<U: Collection> {
  init<T: RandomAccessCollection>(_ arg: T) where T == U { ... }
}

struct Foo<U> where U : Collection {}
extension Foo where U : RandomAccessCollection {
    init(_ arg: U) { ... }
}

The second one compiles fine. Can you get whatever you are trying to hack to work the second way?

anthonylatsis · September 4, 2018, 9:52pm

How could I miss that, you're right. Sorry for not paying enough attention to your call on constrained extensions.

@CTMacUser, try this out. Not dynamically dispatched, but you can control which function to resolve to by overloading the initializer in a constrained extension.

struct Foo<U: Collection> {
  init(_ arg: U) {
    foo(arg)
  }
  func foo<T: RandomAccessCollection>(_ arg: T) { print("RandomAccessCollection") }
  func foo<T: Collection>(_ arg: T) { print("Collection") }
}

extension Foo where U: RandomAccessCollection {
  init(_ arg: U) {
    foo(arg)
  }
}

Foo.init([4, 5, 6])
Foo.init("rrrrrrrrr")

// RandomAccessCollection
// Collection

Jens · September 5, 2018, 1:40pm

But that won't work in a generic context (as in bar below):

struct Foo<U: Collection> {
    init(_ arg: U) {
        foo(arg)
    }
    func foo<T: RandomAccessCollection>(_ arg: T) { print("RandomAccessCollection") }
    func foo<T: Collection>(_ arg: T) { print("Collection") }
}

extension Foo where U: RandomAccessCollection {
    init(_ arg: U) {
        foo(arg)
    }
}

func bar<T: Collection>(_ c: T) {
    let _ = Foo.init(c)
}

bar([4, 5, 6]) // Collection
bar("rrrrrrrrr") // Collection

PS: I might have misunderstood the requirements of the OP, sorry if that is the case.

CTMacUser · September 6, 2018, 2:02am

This worked, thanks. (I kept the dispatched functions outside of the struct.)

But why? Does the extension's version of the initializer shadow over the base version when both have the same signature? Could I have a version for Collection in the base definition, one for BidirectionalCollection in an extension that's defined for bidirectional collections, and a third for RandomAccessCollection in yet another extension that's defined for random-access collections?

anthonylatsis · September 6, 2018, 2:05am

The initializer in the extension has a different generic signature (don't forget that the constraints on U are different). The one whose signature most accurately satisfies the type you passed as an argument is statically resolved as the one to be called. That should answer your last question (yes).

SDGGiesbrecht · September 6, 2018, 2:26am

In case you really meant “when both have the same name” (the part you actually have to type at the call site):

Basically yes,

Of all the functions/initializers with the same name available to the compiler at a particular call site, it always chooses the one with the most specific satisfiable constraints.

RandomAccessCollection is more specific than just Collection, so as long as the type conforms to it, the compiler will prefer its overload and dispatch to it directly. Where the type is not random‐access, it does not satisfy the constraints and so that overload is struck from the list, leaving the compiler to fall back to the next most specific: the simple Collection variant.

This is still not dynamic dispatch. Whenever the compiler is unaware of the conformance to RandomAccessCollection at the call site, such as in the opaque function below (because V only satisfies Collection), it will only know about the unoptimized variant, and that is where it will dispatch the call. This is precisely the snag you hit at the very beginning: Each calling function (recursively) must by declared twice—once with each variant—or else it will inadvertently block access to the optimized implementation. What you did to fix your problem was to walk back up the call tree one function and add its dual declaration. You will need to continue walking back up the tree through as many layers as you care about.

SDGGiesbrecht · September 6, 2018, 2:26am

Yes.