This pitch assumes that the pitch Make Collection Super-Convenient and Retire Sequence moves ahead. If you are not familiar with that you may wish to read it first.
TLDR; Many algorithms need to process every element of a collection. This is a significant and pervasive requirement which results in an infinite loop and / or a crash when it is not met. We should elevate this requirement from a precondition on algorithms to a semantic requirement of a new MaterializableCollection
protocol and move the relevant algorithms from Collection
to this protocol.
Finite and infinite collections
The topic of finite vs infinite sequences and collections has come up in quite a few Swift Evolution threads. @dabrahams summed up the reason no meaningful progress has been made in this area in the above mentioned pitch:
When it comes to termination or memory exhaustion, there’s little practical difference between an infinite collection and one that is simply huge. We don’t have a problem with 0...UInt64.max as a collection, and yet no reasonable program can process all of its elements.
In other words, what we care about in practice when we talk about "finite" collections is termination of the algorithm. The mathematical notion of finitude is a necessary but not sufficient constraint. Termination requires a finite collection that is small enough such that every element can be materialized in practice so that it can be processed.
Problems with the current design
The current design does not encode the requirement of materializability in the langauge. Instead, the standard library uses precondition documentation on specific algorithms. The implied expectation is that user-defined code that is generic over Collection
will do the same when necessary and callers will be diligent in meeting this precondition.
Unfortunately this precondition is quite pervasive. Roughly half of the members available on Collection
require every element to be processed in the general case. I haven't done any analysis, but I would speculate that the ratio is even higher for third party extensions.
In practice, most types conforming to Collection
are able to materialize all elements. There are several reasons for this. The most commonly encountered collections store their elements, thus the elements have already been materialzed. Further, instances of finite collections such as ranges that do not store their elements are often (usually?) small enough to materialize their elements on demand without trouble. Most importantly, infinite collections are not allowed (for example a lazy mathematical sequence).
Complexity and efficiency
@dabrahams provided very useful criteria for evaluating when a new protocol should be introduced:
When we discover that an already-useful protocol lacks the requirements needed to support an important usage, we have to decide where to add those requirements: to the original protocol, or to a new refinement of the original protocol. To make that decision, we ask whether any known models of the protocol would be unable to efficiently support the new requirement. This is, for example, why BidirectionalCollection and RandomAccessCollection are distinct protocols: random access is an important capability for some algorithms, but there is no efficient way for important collections such as Dictionary and String to support it.
Collection
clearly does not require the ability to materialize all elements of a conforming type yet such a requirement is necessary for a large number of algorithms. This begs the question: are any known models of Collection
unable to efficiently materialize all of their elements? The answer depends on how we define "efficiently". I argue that we should not restrict the meaning of "efficiently" to Big-O complexity.
It requires quite a stretch of the imagination to say that an infinite collection or even 0...Int.max
is able to efficiently support an O(n) algoirthm in practice. In such cases theoretical Big-O complexity is irrelevant to actual usage in real world programs. The logic that argues for BidirectionalCollection
and RandomAccessCollection
also argues for MaterializableCollection
.
The pitch
As stated previously, I propose introducing a MaterializableCollection
protocol. All members which must process every element in the general case will be moved from Collection
to the new protocol. Note: this includes count
, thus allowing infinite collections without requiring them to implement count
.
Some requirements and extensions of Collection
can be implemented by BidirectionCollection
or a RandomAccessCollection
without having to materialize every element in the general case. For example, all RandomAccessCollection
s will be able to implement count
without needing to traverse the collection and all BidirectionalCollection
s will be able to work from the end of the collection without traversing the collection, thus supporting members such as last
, dropLast
, etc. Members for which this is the case will be moved to both MaterializableCollection
and either BidirectionalCollection
or RandomAccessCollection
.
I have spent some time doing a more detailed analysis of how individual members would be impacted but am omitting that in this pitch. If there is sufficient interest I intend to collaboarate with anyone interested to flesh out a more complete detailed design.
Collections such as Array
and String
that store their elements interanlly will conform to MaterializableCollection
. Collections that do not store their elements internally may conform to MaterializableCollection
, but only when all instances are statically known to be small enough to be materialized in practice (for example, AllCases
collections may fall into this category).
Collections such as CountableRange
will not conform to MaterializableCollection
directly. Instead, they will provide a materializable
view which allows users to assert that the instance they are working with is materializable. The standard library would provide a generic MaterializableCollectionWrapper
type (this name is a strawman) that can be used with any Collection
.
Migrating existing code
This pitch has a significant impact on source compatibility which will need to be carefully considered.
Migrating code that uses concrete Collection
types that do not conform to MaterializableCollection
Concrete types that conform to Collection
but will not conform to MaterializableCollection
(such as CountableRange
) will no longer have members which require materializability. Code that uses these members on instances of the concrete type will need to use the materializable
property to assert that this usage is valid.
Migrating Collection
constraints
Generic code using a Collection
constraint as well as members that require materializability will need to be migrated to require MaterializableCollection
instead. This will have ripple-effects which will need to be addressed by either increasing a generic constraint to MaterializableCollection
or using the materializable
view.
Migrating user-defined Collection
conformances
Many user-defined collections will need to be updated to conform to MaterializableCollection
instead of just Collection
.
Open questions
What terminology should be used?
The term "materializable" was chosen as it represents the requirement in a very concrete way but there are alternatives that could be considered. Most notably, Haskell defines the type class Traversable
which is defined as the "class of data structures that can be traversed from left to right, performing an action on each element." I would be happy to revise this pitch to use the name TraversableCollection
if feedback indicates that is a better choice.
Should the materializable
view be provided on Collection
itself or only on concrete types?
If we are going to support infinite collections there will be some conforming types for which no instances are truly materializable. On the other hand, the next section discusses an adanced use case where it may be useful to "pretend" that they are materializable.
How does MaterializableCollection
interact with MutableCollection
?
Mutability seems to imply that elements are stored and includes algorithms such as sort
which imply processing of every element. Should MutableCollection
refine MaterializableCollection
? If so, what is the impact?
Is there a path to a similar design which would allow a smoother migration?
Instead of moving requirements from Collection
to MaterializableCollection
perhaps we could phase in the materializable requirement on Collection
itself and repurpose Sequence
or introduce a new protocol with the more relaxed requirment of a "multi-pass, potentially infinite" collection.
Footnote on advanced use cases
@dabrahams mentioned and advanced use case for using O(n) algorithms on infinite collections:
You just expect an O(N) algorithm on an infinite collection not to terminate. Furthermore, I don’t think I’d want to outlaw the use of O(N) algorithms on infinite collections. I lost the battle long ago against side-effectful parameters to algorithms like reduce, which is one reason we have .lazy (there are other good reasons too, though). Given that we’re not re-litigating that question, I have no problem with someone calling reduce in a thread on a potentially infinite collection of messages from another process, and using its side-effects to do whatever they want.
It is possible to support this use case by using a materializable
view on the infinite collection. This doesn't align with the "materializable" terminology precisely but that seems like an acceptable drawback to me.