Dynamically-dispatched protocol extension members

Karl · January 14, 2020, 6:17am

This is just a pitch; I don't have an implementation. I'd like to gather feedback from the community about how desirable this feature would be, and from the compiler team about how it might be implemented:

Motivation

A protocol's members may be either dynamically or statically dispatched, depending on how they are declared. Protocol requirements are dynamically dispatched, meaning that they resolve to conformance-specific implementations in generic code, whereas members defined in extensions are statically dispatched, meaning that they may not be overridden, and generic code will not use any such overrides even if they are available.

Dynamic dispatch is very important when optimising certain generic operations. For example, Array makes use of multiple hidden protocol requirements in Sequence to optimise copying from types with compatible backing storage and from other contiguous sequences. Such optimisations are not available to types outside of the standard library, as they cannot add requirements to Sequence, Collection, or any other standard library protocols.

This proposal seeks to allow members defined in protocol extensions to be dynamically dispatched, even if defined outside of the module which owns the protocol definition. This means that if you have an algorithm which can work on a Collection, but has an optimised implementation for, say, RandomAccessCollection, and that function is marked as using dynamic dispatch, generic code with the constraint <C: Collection> will resolve calls to the optimised version when given a RandomAccessCollection.

Proposed solution

Swift already has the dynamic keyword, which allows members of reference-types to opt-in to dynamic dispatch. This proposal would enable the same keyword to be used for members in protocol extensions.

Take the following example:

protocol MyProto {}
extension MyProto { 
  func myMethod() { print("Protocol method") } 
}

struct NoOverride: MyProto {}

struct HasOverride: MyProto {
  func myMethod() { print("Struct method") }
}

func call_generic<T: MyProto>(_ object: T) { 
  object.myMethod() 
}

let defaultObject = NoOverride()
defaultObject.myMethod() // "Protocol method"
call_generic(defaultObject) // "Protocol method"

let object = HasOverride()
object.myMethod() // "Struct method"
call_generic(object) // "Protocol method"

This snippet illustrates a common source of confusion for developers who are relatively new to Swift. By marking the method myMethod as dynamic, the developer opts-in to the same effective behaviour as a protocol requirement with a default implementation. This means that the specialised implementation in HasOverride will be called in generic code, rather than the default implementation which is used for other conformances to MyProto:

extension MyProto { 
  dynamic func myMethod() { print("Protocol method") } 
}

struct NoOverride: MyProto {}

struct HasOverride: MyProto {
  func myMethod() { print("Struct method") }
}

func call_generic<T: MyProto>(_ object: T) { 
  object.myMethod() 
}

let defaultObject = NoOverride()
defaultObject.myMethod() // "Protocol method"
call_generic(defaultObject) // "Protocol method"

let object = HasOverride()
object.myMethod() // "Struct method"
call_generic(object) // "Struct method"     <-- This.

Detailed design

As it does for class members, dynamic will support both properties and functions defined in protocol extensions. Using a dynamic protocol member from generic code should resolve to the same implementation as calling the protocol member directly on the value's dynamic type.

Specifically, it should be possible to override a dynamic protocol member in a refined protocol, as protocol requirements currently can:

extension Collection {
  dynamic func myAlgorithm() -> [Element] {
    /* default implementation */
  }
}
extension RandomAccessCollection {
  func myAlgorithm() -> [Element] {
    /* optimised implementation */
  }  
}

func doIt<T: Collection>(_ items: T) -> [T.Element] {
  return items.myAlgorithm()
}

let array = [1, 2, 3, 4]
array.myAlgorithm() // should call RandomAccessCollection version, as the best match for Array.
doIt(array) // should call RandomAccessCollection version, as the best match for Array.

The override keyword is not required, mirroring the behaviour of protocol requirements with specialised default implementations. It is also not necessary for specialised implementations to repeat the dynamic keyword (although it is allowed), mirroring the behaviour of overrides in reference-types (although they do require the override keyword).

Source compatibility

This is an additive change and will not change any behaviour in existing programs.

Effect on ABI stability

This feature will almost certainly require an updated runtime.

Effect on API resilience

Adding or removing dynamic is an ABI-breaking change, as it is for classes.

Alternatives considered

Make dynamic dispatch the default

This would be a source-compatible but potentially behaviour-altering change and needs to be considered carefully as its own proposal. The goal of this proposal is only to allow dynamic dispatch for protocol extension members, not to change which kind of dispatching is used implicitly.
Require override or repeating dynamic for specialised implementations.

There is an argument to be made for this, but protocol requirements don't ask you to do it today. The difference between protocol requirements and extension members is confusing enough for developers today - adding more syntax differences would likely do more harm than good.
Do nothing

One workaround being used today is to define a secondary protocol which duplicates the members which should be dynamically-dispatched:
```
protocol MyProto_Extensions {
  func myMethod()
}
extension HasOverride: MyProto_Extensions {}

func call_generic<T: MyProto>(_ object: T) { 
  guard let instance = object as? MyProto_Extensions else { object.myMethod(); return }
  instance.myMethod() 
}
```
This basically re-implements dynamic dispatching using retroactive conformances and a dynamic cast. It isn't a very obvious solution to something that should be simple, doesn't work for protocols with associated types (like Collection & friends), and only allows overrides on a per-concrete-conformance basis (you couldn't make all RandomAccessCollections conform to an "extensions" protocol, because protocols can't conform to other protocols).

Genaro-Chris · January 14, 2020, 8:22am

+1 on this one. I really like the idea

Panajev · January 14, 2020, 8:29am

I would like dynamic dispatch being the default and the source code converter to add a static keyword or nonvirtual, but I could deal with having dynamic in the extensions instead.

nonameplum · January 14, 2020, 3:00pm

I would opt for this idea as I would give more freedom. Each implementation of the protocol would have a choice of how the method will be dispatched.

Either of these two will be chosen +1.

ebg · January 14, 2020, 4:56pm

Yes please, let's finally fix this! Fine with the proposal or alternative of default behavior, anything to get this into the language sooner than later.

wiresoft · January 14, 2020, 6:05pm

+1 this is how it should have been all along. I also agree with @Panajev that dynamic should be the default

dabrahams · January 14, 2020, 6:11pm

I'd love to have something like this; my main concern would be implementability. In the absence of an implementation, do you at least have a sketch of how it could be made to work at the ABI/codegen level? One way to express such a sketch would be in terms of equivalent generated code, e.g. something like the workaround you illustrate in the "Do nothing" bullet.

Karl · January 15, 2020, 4:13pm

I'm glad there's support for this feature! We see people confused about protocol dispatching from time-to-time, but AFAIK there hasn't really been a serious discussion about adding opt-in dynamic dispatch to protocol extensions.

FWIW, this is actually in the generics manifesto, under "maybe":

There are a number of features that get discussed from time-to-time, while they could fit into Swift's generics system, it's not clear that they belong in Swift at all. The important question for any feature in this category is not "can it be done" or "are there cool things we can express", but "how can everyday Swift developers benefit from the addition of such a feature?". Without strong motivating examples, none of these "maybes" will move further along.

IMO, there are plenty of strong motivating examples for this feature, so it would be very useful to everyday Swift developers. I think that section is talking about making dynamic dispatch the default, rather than opt-in.

I guess we would construct a lookup table like we do for protocol conformances. I still need to look more closely at the corner cases surrounding how exactly protocol witness tables are constructed (e.g. if you use a protocol's default implementation, but then somebody in another module implements the requirement directly on your type, which version does Swift choose to call? Are protocol requirements entirely resolved when the first module is compiled?)

The secondary protocol example shows a basic way we could implement this, if we could work-around its problems. If we collected all dynamically-dispatched members in to a protocol behind-the-scenes, it would look something like this:

protocol Collection_Extensions {
  func myAlgorithm()
}
extension Collection_Extensions where Self: Collection {
  func myAlgorithm() {
    print("Collection default")
  }
}
extension Collection_Extensions where Self: RandomAccessCollection {
  func myAlgorithm() {
    print("RandomAccessCollection default")
  }
}

func doIt<C: Collection_Extensions>(_ val: C) {
  val.myAlgorithm()
}

extension Array: Collection_Extensions {}
doIt([1,2,3]) // RandomAccessCollection default

So then the only question is how to make all Collections conform to that protocol (or - how to we tell the compiler & runtime that everything which conforms to Collection also has a conformance somewhere for this extension protocol?). Also, we would need to create separate protocols to collect fileprivate and private members in (although those would hopefully get devirtualised and optimised away).

There are still open questions about the implementation, no doubt, but it looks do-able.

ebg · February 19, 2020, 2:20pm

How can we help move this along? Looking at the Swift 6 roadmap post, this fits in nicely with #3: Invest in user-empowering language directions. It can be difficult at present to design some protocol-based solutions because of lingering gotchas with how implementations are dispatched. I often have to steer clear entirely of anything involving extensions and view controllers, for example.

Wildchild9 · February 20, 2020, 4:24pm

This is a feature is a biggest +1 for me.

I think that this feature would be incredibly helpful in particular for writing generic collection algorithms. Without this, in a generic context, algorithms use the default implementation on the protocol itself rather than the implementation provided by the concrete type itself. Consider the following example:

Let's say I have an OrderedSet collection type.

struct OrderedSet<Element>: MutableCollection { ... }

Now lets, consider the swapAt(_:_:) function on MutableCollection.

MutableCollection provides a default implementation of this function:

extension MutableCollection {
    public mutating func swapAt(_ i: Index, _ j: Index) {
        guard i != j else { return }
        let tmp = self[i]
        self[i] = self[j]
        self[j] = tmp
    }
}

Due to the semantics of an ordered set, this implementation does not work properly for OrderedSet. If it were used it would have not swap the elements:

var orderedSet: OrderedSet = [...]

// Indices intended to be swapped
var i: Int = ... // some index
var j: Int = ... // some other index

// Algorithm preformed by `MutableCollection`'s default
// implementation of swapAt(_:_:)
let tmp = orderedSet[i]

// Operation is not performed because cannot assign non-unique element to ordered set
orderedSet[i] = orderedSet[j]

// Since the previous assignment was not performed, the value of `tmp` is not unique
// with respect to `orderedSet` as `orderedSet[i]` remained unchanged. And as such,
// this operation is not performed for the same reason as the previous one.
orderedSet[j] = tmp

Because of this OrderedSet provides its own implementation of swapAt(_:_:).

struct OrderedSet<Element>: MutableCollection {
    
    // ...
    
    mutating func swapAt(_ i: OrderedSet.Index, _ j: OrderedSet.Index) {
        guard i != j else { return }

        // Disables uniquing of ordered set
        self._enforceUniquing = false
        
        // Because uniquing enforcement has been turned off, the following
        // operations work as expected
        let tmp = self[i]
        self[i] = self[j]
        self[j] = tmp
        
        // Re-enables uniquing of ordered set
        self._enforceUniquing = true
    }
}

This implementation now works, so swapAt(_:_:) should work as expected when called on a concrete instance of OrderedSet.

var orderedSet: OrderedSet = [0, 1, 2, 3, 4]
orderedSet.swapAt(0, 4)

print(orderedSet)
// Prints "[4, 1, 2, 3, 0]"

The problem now arises when using OrderedSet with a generic algorithm on MutableCollection that calls swapAt(_:_:).

extension MutableCollection {
    mutating func genericSwapAt(_ i: Self.Index, _ j: Self.Index) {
        swapAt(i, j)
    }
}

Since this is in a generic context, genericSwapAt(_:_:) uses the default implementation of swapAt(_:_:) rather than the implementation provided by OrderedSet. As such, genericSwapAt(_:_:) does not work as expected when called on OrderedSet.

var orderedSet: OrderedSet = [0, 1, 2, 3, 4]
orderedSet.genericSwapAt(0, 4)

print(orderedSet) 
// Prints "[0, 1, 2, 3, 4]"

While the generic algorithm in the above example is fairly impractical, this has many practical implications. For example, since in-place generic sorting algorithms heavily utilize swapAt(_:_:), these algorithms don't work properly with OrderedSet. More broadly, providing custom implementations for protocol members allow conforming types to leverage their efficiencies. This is especially prevalent with custom collections as different collections have different complexities for certain algorithms. Providing their own custom implementations over default implementations allows them to meet and optimize their algorithmic complexities. While this is all well and good when calling said functions on a concrete type, all of its benefits are lost when using it in a generic context (which is extremely common with collection algorithms).

This feature would enable types to play along better with generic algorithms, and having run into this myself when creating my own custom collection, I think that this would be a welcome addition to Swift.

jawbroken · February 24, 2020, 12:30am

Isn't the problem here actually that OrderedSet can't conform to MutableCollection? I don't see how you can provide these semantics if you just ignore assignments:

A value stored into a subscript of a MutableCollection instance must subsequently be accessible at that same position. That is, for a mutable collection instance a, index i, and value x, the two sets of assignments in the following code sample must be equivalent:
a[i] = x
let y = a[i]

// Must be equivalent to:
a[i] = x
let y = x

Wildchild9 · February 25, 2020, 12:17am

I was only using OrderedSet for the sake of the example. Also the statement is true in the hypothetical OrderedSet type iff the value of x does not already exist in said ordered set.

jawbroken · February 25, 2020, 1:03am

I don't think you can say that a type fulfils the semantic requirements of a protocol because it does so for a subset of instances.

I think that any implementation in the protocol, whether statically or dynamically dispatched, should always be correct with respect to the semantics of the protocol (as swapAt appears to be for the semantics of MutableCollection). You should ideally never have a case where overriding the implementation is required for correctness, though you may want to override it for other reasons, like performance or logging. Do you know of any examples where this isn't the case? And should they be considered bugs, either in the implementation or the documentation of the semantic requirements?

Wildchild9 · February 25, 2020, 1:51am

So the example that originally prompted me was a LinkedList type I was making. Because I made my inner node type a class and the linked list type itself a struct, I was required to implement COW semantics on my own. I did this by copying the nodes before any indices are mutated in mutating functions where the id property of the list was not uniquely referenced (meaning a copy had been made of the list). The problem arose when I tried to use swapAt(_:_:) from a generic context after copying the list.

extension MutableCollection where Element: Comparable  {
    mutating func customSwapAt(_ i: Index, _ j: Index) {
        guard i != j else { return }
        let tmp = self[i]
        self[i] = self[j]
        self[j] = tmp
    }
}

var list: LinkedList = [0, 1, 2, 3, 4]
var copy = list
list.customSwapAt(list.startIndex, list.index(before: list.endIndex)) // Error here

Let me explain why this failed.

In my LinkedList type, every LinkedList.Index had a weak listID property that was set to the id of the list from which it was derived. If a list is indexed by an index with a listID that does not match that of the list it is attempting to access, the operation fails. Also, to properly implement COW, upon a subscript assignment operation, if the LinkedList's id is not uniquely referenced, then said linked list is copied and the matching index in the copy is found with which to perform the assignment operation.

This causes a problem in customSwapAt(_:_:) because:

var list: LinkedList = [0, 1, 2, 3, 4]
var copy = list  // Because of COW, these two still share the same nodes
let i = list.startIndex
let j = list.index(before: list.endIndex)
let tmp = self[i] 

// Since `id` is not uniquely referenced because of `copy`, the linked list 
// is copied, gets a new unique `id`, and the assignment operation is performed
self[i] = self[j] 

// Since `id` has been changed and `j` is still a member of `copy`, this operation
// fails as `j` is not a valid index of `self`
self[j] = tmp

In my custom implementation of swapAt(_:_:) I am able to account for this and respond correctly:

public struct LinkedList<Element> {

    //...

    public mutating func swapAt(_ i: LinkedList<Element>.Index, _ j: LinkedList<Element>.Index) {
        var (i, j) = (i, j)
        if isKnownUniquelyReferenced(&id) {
            // Get corresponding indices for `i` and `j` in new linked list
            let updatedIndices = copyNodes(updating: i, j)
            i = updatedIndices[0]
            j = updatedIndices[1]
        }
        let temp = i.node?.value

        public mutating func swapAt(_ i: LinkedList<Element>.Index, _ j: LinkedList<Element>.Index) {
        var (i, j) = (i, j)
        if isKnownUniquelyReferenced(&id) {
            let updatedIndices = copyNodes(updating: i, j)
            i = updatedIndices[0]
            j = updatedIndices[1]
        }
        let temp = i.node?.value

        if let value = j.node?.value {
            i.node?.value = value
        }
        if let value = temp {
            j.node?.value = value
        }
    }
}

I know that was a very longwinded example, but that was where this really hit me. I understand that this might be considered a bad implementation by your standards, but I felt that I should be able to better enforce the semantics of type.

jawbroken · February 25, 2020, 8:48am

Oh, it's nothing about it being a bad or good implementation, I'm just not sure that your type can conform to MutableCollection with the current implementation, and overriding swapAt doesn't fix that.

MrBeardsley · February 25, 2020, 4:29pm

This would be a hugely beneficial addition. The lack of it bites us regularly at my workplace. We have had several situations where a protocol and default implementation is added to a class like UIViewController, and it works great. At some point in the future there is a need to customize the behavior for one particular view controller subclass. Unfortunately, at this point we usually need to do a refactor because there is no easy way to customize the behavior.

dabrahams · March 20, 2020, 12:38am

Just for the record, there was a point when we were redesigning the Collection protocols and we intentionally threw the use case of a linked-list-with-value-semantics “under the bus,” making it a very difficult thing to do correctly, because that allowed us to make other design choices that we considered more important. Part of the justification was that when you really want a linked list rather than some other data structure, you probably want it for its very "reference-y" features, like list splicing. So if this is hard for you… don't be surprised!

karim · March 20, 2020, 1:16am

+1 from me. Have hit this issue many times at the intersection of protocols and generics.

CTMacUser · March 20, 2020, 6:31am

A MutableCollection type has to support arbitrary mutations on an element (or elements). If your type's semantics doesn't support every potential tuple product (of supported lengths) of element values, then it can't conform to MC.

This is like Set, Dictionary, and many user-defined collections (including some of mine) with respect to RangeReplaceableCollection. They all support unregulated removals, but not unregulated insertions. Such types cannot conform to RRC, which requires both unregulated removals and insertions (as well as replacements). At best, they can support a custom copy of the removal subset of RRC (which does not have a base protocol for just that).

ebg · March 20, 2020, 4:54pm

Hope to see some traction on this and from the OP @Karl.

I'm not versed in the Swift compiler but would love to know how to help get this ready for review.