PermutableCollection protocol

Permutable adj. - capable of changing sequence

Introduction

This proposal aims to add a PermutableCollection protocol that would sit between Collection and MutableCollection within the collection protocol hierarchy.

Motivation

There are instances where one desires the ability to move and sort elements, while disallowing (or constraining) the mutation of an element. It is a notable ommission given the sheer number of less consequential collection protocols. When creating custom collection types, a PermutableCollection protocol would be preferred in any ordered collection that enforces element uniqueness and/or identity. Use cases would include the forthcoming OrderedDictionary and OrderedSet. For immutable and mutable collections, it could also serve to provide unified access to the underlying (and presumably highly optimized) sorting algorithms used by Swift.

Design

The protocol design is simple with a single swapAt requirement

/// A collection that supports sorting.
protocol PermutableCollection<Element> : Collection where Self.SubSequence : PermutableCollection {

  mutable func swapAt(_ i: Index, _ j: Index)
  
}

Through the swapAt function, additional sorting functions implementations are added via extension

extension PermutableCollection {

  mutating func move(fromOffsets source: IndexSet, toOffset destination: Int) {
    // move algorithm enacts changes via swapAt()
  }
    
  mutating func partition(by belongsInSecondPartition: (Element) throws -> Bool) rethrows -> Index {
    // partition algorithm enacts changes via swapAt()
  }
  
  mutating func sort() where Self: RandomAccessCollection, Self.Element: Comparable {
    // partition algorithm enacts changes via swapAt()
  }
  
  // ... more permutation operations that mimic those available for MutableCollection
  
}

I have created a working (but unoptimized) example of PermutableCollection and posted it here.

Alternatives

Alternative #1: when sorting functionality is needed, add it manually.

This certainly works, but requires significant boilerplate code. And without an underlying protocol, the existence of functions and their naming is not strictly enforced. When added manually, there is unfortunately no guarantee that the sorting algorithms are equivalent to those in MutableCollection and RangeReplaceableCollection

Alternative #2: just use MutableCollection which 'automatically' implements various sorting algorithms

While this approach quickly adds the desired sorting functions (swap, partition, move, sort, etc), it makes a promise that each element can be mutated at any time to any value. There are many situations where this is promise is undesirable. Furthermore, the majority of the permutation operations result in transient duplicate values during the permutation. I anticipate that many of the use-cases for PermutableCollection would be in situations where some form of uniqueness is enforced and duplicates (even transient ones) may violate the contraints of the collection. I can also envision a collection where the indices may be in a separate store than the elements. The way that MutableCollection permutations are implemented, it is not possible to update the indices without updating the elements as well (though a hacky version of this version might be possible if the elements were all uniquely identified).

Potential Shortcomings

  • There are forseeable compatibility issues with MutableCollection and RangeReplaceableCollection
  • Performing the permutable operations strictly via swapAt may result in performance degredation

Conclusion and Future Directions

PermutableCollection seems like an overlooked protocol in Swift that would see immediate use in the forthcoming OrderedSet and OrderedDictionary. In addition to allowing for easier creation of new collection types, it also has the advantage of consolidating all of the permutation operations via the swapAt function.

Though I anticipate some compatibility issues, it may also be desirable to have MutableCollection and RangeReplaceableCollection route all of their permutation operations through swapAt as well. This would foster more consistent behaviour of the various collection protocols. In such a scenario, all of the collection protocols (and not just PermutableCollection) could 'override' the handling of permutation operations by defining a swapAt function.

3 Likes

Setting aside the merits of the proposal, I don't think that we could actually make this change in an ABI stable fashion at present. @Slava_Pestov, is that right?

3 Likes

Yeah, the set of base protocols of a protocol must be fixed right now; there's no mechanism to declare a "default" associated conformance for Self: P.

3 Likes

Would that be a considerable feature for something like Swift 6? (The mechanism for back deploying protocols would seem useful alongside that, too.)

Sorry, I’m not sure it’s feasible to add at this point, as it wasn’t part of the resilience model.

Out of curiosity, what would it take to make it possible to insert a new intermediate protocol between two existing protocols in a non-breaking way?

I don't have the slightest idea, but Library Evolution in Swift mentions:

New protocol requirements can be added to protocols, as long as the new requirement has a default implementation defined in a protocol extension.

So maybe we could imagine that PermutableCollection requirements would be added to the parent protocol Collection, with default implementations that trap, along with a compiler feature that would hide those methods from mere Collection instances so that only PermutableCollection can use it?

:dog: I have no idea what I'm doing.

That doesn't get MutableCollection to refine PermutableCollection, which means that you cannot use a MutableCollection in generic code constrained on PermutableCollection. So either:

  1. You can use these methods in generic code constrained only on Collection, which will fail when called with a type that doesn't actually conform to PermutableCollection

or

  1. You cannot write an algorithm constrained to PermutableCollection and call it with a generic type constrained to MutableCollection

either of these limitations defeats the point of trying to insert it into the collection hierarchy. You would get the same benefit by just introducing the new protocol and conforming concrete types, without any shenanigans. In order to get the benefit you need both the APIs themselves and the partial order on constraints.

If we really needed to, I think we could retrofit this sort of thing into the existing ABI in terms of existing functionality. Another way to look at a protocol P: Q relationship is that P has an associatedtype Super: Q where Super == Self. If we know a protocol refinement is retroactive, maybe we could implement it that way and treat the refinement as sugar for the underlying associated type slot.

10 Likes

Thank you to @scanon @Slava_Pestov @pyrtsa @gwendal.roue @Joe_Groff for your input.

I have encountered (and now mostly understand) the difficulties that you have mentioned in having MutableCollection refined by PermutableCollection. I expected that "inserting" the collection into the hierarchy would be a heavy lift. My expectations were low.

But, I still don't understand if inserting a new protocol within a protocol hierarchy CANNOT be done or WILL NOT be done in the future. Do any of the current collection protocols somehow "override" this limitation? If so, can these overrides be made available within the language itself? Perhaps some more clarification would be helpful on these points.

While I understand that "inserting" the protocol in the hierarchy is difficult, I still wonder whether you all think that PermutableCollection has merit as a protocol inheriting from collection? It's inability to be refined by MutableCollection could even be considered a feature... why would you need both simultaneously?

My prototype version (see above for link) appears to run successfully when used with a concrete type that conforms to both RandomAccessCollection and PermutableCollection. Is there a situation where I will get unanticipated results that I am not thinking of?

It cannot be done unless someone figures out a clever workaround. No protocol has been inserted into an existing protocol hierarchy anywhere in the standard library since Swift has achieved ABI stability on Apple platforms.

4 Likes

No, there's no "override" mechanism. A protocol cannot be modified to refine another protocol after it is introduced in a binary-stable manner; whatever protocols it refines when the module becomes binary stable, it refines exactly that set forever.

As Joe mentioned, it might be feasible to relax this at some future point, but that's the state of the language today.

3 Likes

Answering this question would require seeing some more examples. What generic algorithms would we actually write against this protocol? Anything other than sort? Sorting is useful, but probably does not justify a new protocol on its own. In particular, what are the use cases to write generic code constrained against being able to sort, but which do not want the rest of mutable collection?

What concrete types would conform to this protocol, but not mutable collection? You mentioned a few. Are there others?

In general, we could slice up the collection hierarchy arbitrarily finely; we could have a separate protocol for each collection API. There's some value in that, but there's also a lot of value in comprehensibility from not doing that. We try to pick a relatively small number of slices that provide maximum utility and minimum confusion.

3 Likes

I actually think this is the BEST reason for such a protocol. As it stands now, I do not have access to a sorting algorithm that operates strictly through swapping. Though I managed to hack one together for demonstration purposes it requires various (and likely costly) intermediate steps. Is there such an algorithm present in the standard library?

I mentioned OrderedDictionary and OrderedSet earlier. I have also found the need for such functionality when dealing with the storage and ordering of CoreData entities (where I can not directly sort them, but I can swap the property that ultimately sorts them).

As mentioned above, it would also be useful in creation of collection types that have a constraint on mutation operations. I am working on one such protocol tentatively titled as IdentifiableCollection (the concrete version is IdentifiedSet) that would inherit from PermutableCollection. (off topic: I am convinced that SwiftUI is missing a major collection type that can be safely used with Binding, List, and ForEach that would benefit from such functionality). This protocol prohibits the mutation of an identifier (which throws an error in SwiftUI). It would benefit from sorting and moving operations included within the protocol rather than added via boilerplate for each concrete implementation.

My other argument for a PermutableCollection protocol would be something like "that's how it should work" or perhaps I would say that it is more "comprehendable". Permutation preceeds mutation... it seems the natural order of things.

i really wish we had an “AppendableCollection” protocol, that’s like a RangeReplaceableCollection but only supports appends to the end.

Any kind of in-place rearrangement, really. And I think the need is considerable once you have non-copyable types, because you can no longer do that by relying on MutableCollection (unless we add some other way to avoid temporary copies instead of the current naive implementations).

However, I think this also leads to another problem which is that we can't just provide a retroactive implementation of swapAt "for free" via MutableCollection.subscript if we insert the protocol, because then that implementation will not work for non-copyable types, at least not again without some other compiler feature in addition.

8 Likes

+1

Beginner question: if the issue with inserting a new protocol into an existing protocol hierarchy is that this would be a source-breaking change, can’t we just wait to add PermutableCollection (and any other such protocols) to Swift 6?

Inserting a new protocol in a hierarchy is also binary-breaking, and Swift 6 must remain binary compatible with Swift 5 code.

3 Likes

OrderedSet and OrderedDictionary.Elements are indeed collection types that would want to conform to PermutableCollection but not MutableCollection. Do we have any other examples, though?

Not having a good enough answer to this is one of the reasons why swift-collections did not end up shipping a PermutableCollection protocol as a package construct. (Even though I do find it intellectually painful that the ordered collections roll their own implementations for common reordering operations.) I think we need to find a wider class of constructs that would want this protocol before considering adding it to the swift-collections package, much less the stdlib -- especially if we want to do extra work to insert it in between Collection and MutableCollection.

(It would be possible to simply add PermutableCollection as a sister protocol to MutableCollection, with neither refining the other. But we should probably have a better justification than just two types.)

Another reason for omitting PermutableCollection from the package was that it isn't clear to me that generic algorithms over swapAt would be efficient enough to embrace. For example, OrderedSet keeps its items in a simple Array value, and maintains a hash table containing the cached array index of each item. To keep this side table up to date, every OrderedSet.swapAt call has to perform two hash lookups to update the positions of the items it touches.

For a generic partition/sort/shuffle/reverse/etc. implementation that is expressed as a series of swapAt invocations, I expect that updating these cached positions during every step would be quite wasteful: we only need the side table to perform efficient lookups/containment checks, and these common reordering algorithms don't actually need to look up items halfway through the permutation. Discarding the hash table, delegating to Array.partition/.sort/.shuffle/.reverse/etc, and then reconstructing the side table from scratch once we have the final ordering ends up being significantly more efficient -- and so that is what OrderedSet is currently doing. Even if we had the protocol, OrderedSet would likely still want to include custom implementations for these common operations, even in generic context. Allowing such bulk operations to temporarily break invariants is a useful optimization that seems difficult to express in generic code.

(Of course, this argument would not carry as much water if we had plenty of other examples for permutable-but-not-mutable collection types, including some that doesn't have OrderedSet's overhead. Do we have some?)

5 Likes