Should Dictionary support first related methods from Collection conformance?

dabrahams · January 4, 2020, 10:51pm

It wasn't presented that way, but that option was considered.

beccadax · January 5, 2020, 1:26am

The combination of first and dropFirst() is sometimes used in “reduce-style” algorithms which need to visit every element of a collection but are not sensitive to the order of elements. These algorithms work just fine with Dictionary since first will return the same element dropFirst() removes as long as you don’t mutate the collection between the two calls.

For instance, consider this implementation of max():

extension Collection {
    func myMax(
        by areInIncreasingOrder: (Element, Element) throws -> Bool
    ) rethrows -> Element? {
        guard var result = first else { return nil }
        for elem in dropFirst() where areInIncreasingOrder(result, elem) {
            result = elem
        }
        return result
    }
}

It works just fine on a Dictionary:

print(
    “key of largest value:”,
    someDictionary.myMax(by: { $0.value < $1.value })?.key
)

randomElement() would not work here because there is no equivalent to dropFirst() for it. Moreover, even if we did have a dropRandomElement() which returned the elements the previous randomElement() hadn’t, or some other way to split a Dictionary into first and rest with a random “first element”, we would be giving up performance for no good reason—generating a random number is relatively slow and it buys us nothing here.

I’ve shown this as an extension on Collection, but it would make perfect sense in an extension directly on Dictionary, too. In fact, I’d argue that there is no better implementation for it on Dictionary than this one.

Sometimes you just want to get the first element the iteration will return to you, no matter what arbitrary order the iteration might be in. That’s what first is for, and it’s a perfectly coherent thing to want even on a Dictionary.

jayton · January 5, 2020, 10:41am

It’s far from obvious why you wouldn’t use an iterator for this.

Avi · January 5, 2020, 10:53am

Iteration implies order, just as much as, if not more so than, first does. I'm not understanding how using iteration for that particular task would change anything.

CTMacUser · January 5, 2020, 5:18pm

Not exactly. The for-in system only requires that an element-baring instance allows the loop to visit each element exactly once. That does not mean said visitation order is part of that instance's type's semantic! That's the whole problem with Sequence et al.; its significant methods do imply that semantic, which is an over-specification. It's the cause of the whole "Dictionary and Set aren't really sequences" problem.

An improved hierarchy would be:

Collective (i.e. Sequence without the dropFirst, prefix, suffix, etc. parts)
PersistentCollective: Collective (safe for multi-pass)
Sequence: Collective
typealias PersistentSequence = PersistentCollective & Sequence
IndexedCollective: PersistentCollective (i.e. Collection without the startIndex, endIndex, etc. parts; it wouldn't have range-based indexing either, since Index would be Equatable instead of Comparable)
Collection: Sequence & IndexedCollective

But inserting new base protocols is ABI-breaking, even if the members are distributed in a way that still preserves source compatibility. I did once see a post on this forum suggesting a way to make a protocol conform to another after the fact (i.e. like an extension). If that's added, then we could reexamine adding the above protocols.

Even if we did, switching Dictionary and Set to IndexedCollective would be source-breaking. We would have to make a NeoDictionary and NeoSet with the reduced requirements, change the implementations of the first versions to use the second ones, then transition users to prefer the second ones under most circumstances.

jayton · January 6, 2020, 9:24am

I feel it’s quite different to say “a dictionary can provide an iterator that yields its pairs in some order” and “a dictionary has some order”. If this kind of algorithm can be rewritten in terms of iterators in general, which I believe is the case, it negates Brent’s conclusion:

Avi · January 6, 2020, 9:25am

I see. Thank you for explaining.

jenox · January 6, 2020, 11:30am

I might be missing something here, but even if we were saying the dictionary or set can only provide an iterator over its element in some order rather than being a collection of elements in some order itself wouldn't solve all problems, would it?

The underlying conflict of these unordered collections vending an iterator and their Equatable conformance would remain. And I really can't think of a realistic way to solve this problem: Iterating over unordered collections has to happen in some order. And the moment this order is nondeterministic or depends on things other than the value of such a collection, i.e. the order in which the elements were inserted, we have this fundamental conflict with Equatable: suddenly these instances are not substitutable any more!

dabrahams · January 6, 2020, 5:11pm

Threading #FAIL

Many of the postings in this topic appear to be replies to earlier postings, but are missing references to the things they're replies to, so I can't tell to whom I should respond. I don't know if this is because respondents are using the global "reply" button instead of the one below the posting to which they're replying, or because of a bug in Discourse, or something else, but it's definitely making it harder to participate.

@forum_admins Is there, perhaps, an update to Discourse that would make this better?

Jumhyn · January 6, 2020, 5:45pm

I've experienced that the button below individual posts is hit-or-miss as to whether it actually ends up linking the replied-to comment. It's happened enough that I'm reasonably sure (though not 100% sure!) it's not just a mistake on my part hitting the global reply button. I pretty much always quote now when making a direct response to avoid losing context.

cukr · January 6, 2020, 5:56pm

the link doesn't show up when you reply to the last message, otherwise it does
It's consistent for me.

dabrahams · January 6, 2020, 6:05pm

Oh, that's a terrible bug in Discourse, because new posts can come in while you're replying! @forum_admins I really do hope there's an update that could fix this problem.

cukr · January 6, 2020, 6:10pm

Is there a reason you think that the link won't show up if new posts come while you're replying? While I'm writing a post, discourse still knows that it's a reply to your post.

dabrahams · January 6, 2020, 6:16pm

It's not conclusive of course, but this post is one reason. It's clearly not a reply to the post just above it.

michelf · January 6, 2020, 7:02pm

My observation is that Discourse will show which post a reply is to in the web interface unless it's a reply to the post just above it. It's probably a feature to declutter things. But it tracks this information fine since anyone receiving emails will get a clear mention of the post it is replying to.

Here I'm replying to Dave's post just above to show my point.

(This topic should probably go to a separate thread in the Site Feedback section.)

dabrahams · January 6, 2020, 7:14pm

Actually I took it here. @forum_admins it does seem that there's a setting you can adjust; see the reply to my posting.

Karl · January 7, 2020, 5:02am

I'm not sure that should even be a goal - uniform iteration seems to be saying that we want a single protocol or syntax to represent both the multi-pass sequence (the Array) and the single-pass stream, so the problem is locked-in to the question. One is a mutating operation and the other is not, so it's unclear to me why they should look the same. That mutability carries important semantic connotations - namely, about whether or not iteration change any state (and therefore may not be able to be repeated).

Iterators are definitely single-pass - once they return a nil, that's it, all they will ever return is nil. Sequence claims to be single-pass, but supports creating an iterator without mutating (which somehow is not the same as non-mutating iteration...? ) - oh, and since almost all real-world Sequences are multi-pass, almost no 3rd-party code gets tested against single-pass sequences, so following that particular piece of documentation is unlikely to be a good idea. If you're actually writing a single-pass sequence, you should expose an Iterator.

It seems to me that changing Sequence to be multi-pass is the simplest way to fix our collection hierarchy (even just to accept reality). Then we can talk about whether Sequence (a multi-pass sequence without a notion of position) makes sense as an abstraction level between IteratorProtocol and Collection (I guess probably not).

I'm not sure how move semantics would help with this - except perhaps as a general variable "poisoning" tool, so we can say "don't iterate this variable more than once" in the same way we say "don't use this variable after its contents have moved out". But for a single-pass sequence, I don't think either the elements or sequence itself need to be move-only IIUC.

Tino · January 7, 2020, 12:28pm

We seem to have a different interpretation of the word "can" ;-) - or can you give a single example of a Collection which cannot be walked backwards?

Even if this

is "fixed", you could still copy the old implementation from the stdlib - it's just not very efficient.

lukasa · January 7, 2020, 1:44pm

Fine, I will grant that if you have no concerns about complexity then all Collections can be walked backwards, at the very least by way of the trivial algorithm Array(myCollection). What I mean by "can be walked backwards" is "attempting to walk this collection backwards does not perform worse than O(n)", as if it does then I can just create an Array from my Collection and be done with it.

A very straightforward example of such a Collection is a singly-linked list. The natural Index for such a list is simply a pointer to the list element. Calculating the next index from the current one is very cheap (use the next pointer), meeting the soft O(1) criterion on that operation that Collection asks for. Attempting to find the previous Index can only be done by walking from the head until you find the current element, and then returning the previous one. This is an O(n) operation, so the complexity for attempting to walk the entire list this way is O(n^2).

The implementation in the stdlib forbids negative indices on non-bidirectional Collections.

glessard · January 7, 2020, 1:44pm

I’m not sure what you’re saying. The Collection sub protocols are about the algorithmic efficiency of specific operations. Pointing out that you can reproduce operations inefficiently does what?