While I was having a deeper look into the Dictionary API, I wondered if we did the right choice making Dictionary conforming to Collection.
The problem I am seeing is that Dictionary has methods like first or firstIndex that express a sense of order, i.e.: taking the "first" element.
As we know a Dictionary does not have a predictable order and those methods (in my opinion) conflict with this understanding.
This may lead, in particular to early developers, to doubts, like, why first does not return the first element introduced in the dictionary? Or runtime errors, because of not realizing this behavior.
I do understand that, for experienced developers who have an understanding of how Dictionary works, it is not the case and there won't be any doubt, but errors may still happen when using first without realizing it was actually on a Dictionary.
Shouldn't we try to avoid having usage doubts as much as possible?
I was thinking that deprecating those methods in favor of the existing randomElement (which reflects the fact that getting a first out of a Dictionary result in a random element) may be a better approach.
But before proposing that, as maybe I am missing something, I would love to hear your thoughts or use cases you have around Collection related methods for Dictionary and if those use cases can be replaced by randomElement.
Even if first is in Collection, it is there from the sheer fact the iterating though Sequence (and so, get the first item) may be destructive. Your reasoning would apply equally well to Sequence that there may be some semblance of order in Sequence
To that note, the usefulness of Sequence are mostly about the ability to iterate through them.
Even if this were a good idea (which I'm going to leave aside), removing this conformance would be a source- and binary-incompatible change, and therefore isn't something that Swift would be able to do until some hypothetical future point where we decided to take breaking changes again.
Thanks to both comments, maybe I was not clear enough and want to clarify: I was not suggesting to remove conformance to either Collection or Sequence but, maybe, to deprecate and offer an alternative to some of its methods when used on a Dictionary.
In fact, since no solution has been proposed yet, I would prefer to focus more on understanding use cases and/or direct usage of those methods (first, firstIndex etc...) instead
Edit: I modified the title of the original post to better reflect this goal.
Dictionary is a Collection because it has O(1) element access, can return its count in O(1), and supports non-destructive iteration. The rest is there because a choice was originally made to not differentiate between such a collection and an ordered collection.
There have been long discussions in the past about adding layers to the Sequence-and-Collection protocol family, primarily to split a hypothetical OrderedCollection from Collection. Needless to say, it didn't go anywhere.
See [Pitch] Make Collection Super-Convenient and Retire Sequence
I consider the current hierarchy of Collection protocols to be seriously flawed, but it most likely won't change anymore — those who are in charge still seem to be fine with it, and the chains of compatibility are much stronger now than in the past.
Just for the record, that is not a requirement of Collection and should not generally be assumed to be the case. That performance requirement is instead the domain of the more specific RandomAccessCollection, to which Dictionary does not conform (though it does happen to supply count in O(1) anyway):
For example, a random‐access collection, which can measure the distance between two indices in O(1) time, can calculate its count property in O(1) time. Conversely, because a forward or bidirectional collection must traverse the entire collection to count the number of contained elements, accessing its count property is an O(n) operation.
Correct me if I'm wrong, but first and randomElement are not the same. first will always returns the same element if called repeatedly on a (non mutated) dictionary, while randomElement offer no stability at all.
I don't think that is a guarantee, although that is what happens within a single execution. If you create the same dictionary, with the same elements added in the same order in another execution the dictionary can end up in another order (because the default hash function deliberately attempts to start from a different seed on each execution).
As long as Dictionary conforms to Collection, it must supply all of Collection's requirements, including first; that's non-negotiable. That said, there is a good argument that Dictionary should not be-a Collection but should instead have-a Collection of key-value pairs… but it may not be the argument you're thinking of.
The usual line is that ordering is fundamental to what a Collection is (which I agree with), and Dictionary is in some sense “unordered” (which I think is flimsy logic at best—Dictionary has an order, because it repeatably yields the same sequence elements on subsequent traversals).
The argument I find more compelling is that Dictionary's conformance to Equatable and Collectionare in conflict. As the Equatable documentation says, “equality implies substitutability—any two instances that compare equally can be used interchangeably in any code that depends on their values.” But two equal dictionaries may have different orders, and code that depends on the dictionary's value as a collection will have different behavior depending on which dictionary it is passed.
Based on this conflict it is easy to come up with unimpeachably correct code that operates on equatable collections, but is broken when passed a Dictionary.
We could technically solve this by declaring that a Dictionary's conformance to Collection is not part of its value. Personally I think that creates an unusable mental model, because conformance to Collection exposes a huge amount of API (this is just a fancy way of saying, as you did, that it “creates usage doubts.”)
A small amount of fancy semantic footwork is required even if a Dictionary has-a Collection of key-value pairs: the property representing that Collection has to be declared to be not part of the Dictionary's value. Personally I think that creates a tenable user model because it's just one piece of API that can be documented as such, and—crucially—it is available only in contexts where the programmer knows they have a Dictionary: it's not a problem that is just by virtue of crossing into generic Collection code.
All that said, the last time I brought this issue up with the core team, the verdict was that the harm that would be caused by fixing the problem—forcing substantial amounts of code to change, invalidating lots of extant tutorials on the web, etc.—is worse than the harm caused by the problem itself. And as much as I loathe having demonstrable incorrectness in the standard library design, I find it hard to disagree.
Thanks for the detailed reasoning, it does make sense and it feels more a solid argument.
Do you recall if it was presented to the core team a phased migration plan? Like implementing the alternative and deprecating (but keeping) the old functionality for an amount of time. Was this also considered “harm”?
Yes, really ;-)
Wether something is a problem or not is somewhat subjective, but the more I learn about this large family of protocols, the more doubts I get. Probably there are some odd things that actually have a really good explanation, but for a design which predates Evolution, there is no place to look for such justifications.
My general impression is that the hierarchy cares much more for (I'd say premature) optimization issues than for code that actually makes sense:
We have BidirectionalCollection, which afaics is just about efficiency, but we have no notion of fundamental concepts like finite vs. infinite, single- vs multi-pass - and while all collections have (at least an implicit) order, there's still nothing like SortedCollection.
So I can write an generic function that can avoid some theoretical* complexity pitfalls, but for doing a binary search, I have to either roll my own protocol - or introduce another bunch of issues similar to Set.reverse, Dictionary.elementsEqual or infiniteSequence.shuffled.
"Changing things is hard" is definitely a pragmatic explanation, but not a very satisfying one - and in theory, there has been a timeframe where the issues could at least have been discussed seriously.
* Yes, getting the predecessor in a (singly) linked list is somewhat stupid, and accessing the last element is expensive... but at the same time, Collection already inherits dropLast and suffix; and we don't even have neither singly nor doubly linked list in the stdlib (do we? ;-).
This discussion is drifting slightly off-topic, but personally I don't subscribe to the idea of Sequence being single-pass, whatever its documentation says; IteratorProtocol is the one, true, single-pass type! It mutates when producing elements and once exhausted, will never return anything again. Sequence is able to produce iterators without mutating, so it seems obviously to be the multi-pass type.
IteratorSequence is bad and shouldn't exist. Other than that, things are fine IMO.
We could just change the documentation for Sequence. Any actual single-pass sequences in the wild almost certainly won't work with anything.
BidirectionalCollection is about whether you can walk a collection backwards. As an example, Collection has no .last, or lastWhere, but BidirectionalCollection does. The base operation of Collection is index(after:): the base operation of BidirectionalCollection is index(before:). (To complete the story, the base operation of RandomAccessCollection is index(_:offsetBy:)).
I'll try to present some of those justifications here, then. The collection hierarchy is by no means perfect, but I don't think it's as bad as you say. Hopefully this will provide some perspective:
Let me start by saying that it seems from your post that you and I judge the appropriateness of protocols by very different criteria. My criteria are based on the lessons learned by Alexander Stepanov and his collaborators in their work on generic programming, the discipline that protocols were designed to serve. I'm sorry if that sounds pompous; I only mention it because I want you to understand that the choices are not arbitrary, but are based on existing practice, solid experience, and a coherent set of principles.
The ability to use a library, and especially to compose its parts, and reason about the efficiency of the composite is a fundamental feature of generic programming. I consider basing protocol distinctions on efficiency to be legitimate and often desirable. It's not uncommon that the choice of an algorithm's best implementation depends on the protocols to which its operands conform (see rotate for example).
Usually we try to combine efficiency distinctions with syntactic differences in the set of basis operations, by having requirements that only apply to the more efficient protocol. This practice helps with usability and qualitatively, tends to makes protocol differences based on efficiency “feel” more substantial.
That said, in the move from standalone incrementable indices to indices moved by collection methods, the differences in basis operations between BidirectionalCollection and RandomAccessCollection was lost. That's arguably a flaw; to avoid it we would have had to remove the index(:offsetBy:) algorithm from Collection and only make it available to RandomAccessCollection.
OK, there's a lot here, so I'll take one thing at a time:
I disagree with the premise that finite vs. infinite is a fundamental concept that should be represented in the type system, because there exist finite collections that are, for all practical purposes, infinite. Take for example the finite collection 0...Int64.max: there is almost no operation you'd want to statically disallow for an infinite collection but that should be used on this value of Range<Int64>.
Single- vs multi-pass is captured: that's the distinction between Sequence and Collection. As I noted in the posting you referenced, the notion of single-pass is not really well-representable in a language without move-only types, and that accounts for some weaknesses in the current design, but we did make the single-vs-multipass distinction as best the language allowed.
SortedCollection is not needed for binary search. In fact, you want a binary search that doesn't require sortedness to be represented in the type system, because it should be possible to binary search in an Array after it has been sorted. That is, sortedness should be a precondition on the operand of the algorithm. In fact, here's a little gift for you. If you're interested in the background behind the naming choice you can read this and this. There may be a use for a SortedCollection protocol, but so far I haven't found one that justifies its conceptual weight.
I'm not sure what you're arguing for here, but I consider the fact that dropLast and suffix are exposed on Collection to be design bugs caused by the fact that Collection refines Sequence (which is flawed). So I wouldn't want to use that fact to justify any new design decisions.
One would think that, wouldn't one?! It turns out, though, that if you want one protocol to support for…in loops, and you want to be able to uniformly loop over both an immutable Array and a single-pass stream that is consumed by iteration, you end up needing Sequence: a thing you can get an iterator from without mutating it (try working out the design—you'll see!)…
…unless, that is, you have first-class language support for move-only types, by which I mean:
There's a category of operations, known to the type system, that consume values of move-only type (ending their lifetime), but automatically receive a copy of any copyable value.
Single-pass algorithms (and for…in) belong to that category.