What's unconditionally calling count?

I'm making a new type that conforms to LazySequenceProtocol, LazyCollectionProtocol, and BidirectionalCollection. Right now, I'm testing just the Sequence members, which are all customized and not defaulted from Collection. But I trigger breakpoints on both underestimatedCount and count (in that order). I can't figure out what's calling count, even when I wrap my new collections with AnySequence(myWrapper) first.

When I take out the Collection conformances with "#if false," the test code still works, but calls underestimatedCount twice each time instead.

Not all sequence requirements are documented. Some of them use that fancy leading-underscore thing that only the standard library can do, to hide functions which should be dynamically-dispatched precisely for this kind of behaviour in generic code.

There are 3 of them: _customContainsEquatableElement, _copyToContiguousArray and _copyContents. I'm guessing you were creating an Array from your custom collection?

Yes, I was testing converting my custom collection to an Array. I already implement _customContainsEquatableElement. The default implementation of _copyContents is in "Sequence.swift," and it doesn't touch count. That file doesn't have the default implementation of _copyToContiguousArray, though. It's not in "ContiguousArray.swift," but the related "ContiguousArrayBuffer.swift." It has the default implementation for Sequence, another one for Collection, and a third for _ContiguousArrayBuffer.

I could override _copyToContiguousArray to relive my concern.

If you really want to, sure. The standard library authors have decided that calling Collection.count and allocating the correct number of elements is always preferable to allocating space for Sequence.underestimatedCount elements and potentially re-allocating it later. If I had to override that, I would consider very carefully if I was doing the correct thing.

Also, (insert usual caveats about using underscored functionality).

The implementation of underestimatedCount always has either the right answer, or Int.max if said answer is too large. All the new _copyToContiguousArray method does is make a custom preconditionFailure message if Int.max is exceeded. (ContiguousArray can't make such arrays anyway, since they'll be past 8 exbibytes (64 bit), or 2 gibibytes (32 bit), in size already.)

Hmm... I'm not entirely sure what you're saying here, but it doesn't sound like this is a valid conformance to Collection. It all comes down to what you mean by "or Int.max if said answer is too large", and both possibilities point towards an invalid conformance AFAICT:

  • Do you mean that your Collection has more than Int.max elements? That's not a valid Collection.
  • Do you mean that you're using Int.max as a sentinel value? In that case, it's not an underestimate, and any other generic code which uses underestimatedCount will also fail. Prefer to return 0 if you really can't give a closer estimate, but never return a value greater than the actual count.

It work based on combinations and permutations, which are based on factorials, which can get into the quadrillion+ range for even moderate inputs. I've asked about how to handle collections that are way too large, but whose elements are still addressable, but no one answered (so far).

OK, well yeah - that's not technically a valid conformance, but you know that already.

As for "how much of a deal-breaker" it is, that's really impossible to say. Generic code is free to use any part of spec, in any way that is consistent with its documentation. If you go out of those bounds, things might accidentally work, or they might fail in unpredictable ways (now or at any time in the future).

If you're writing your code in terms of generic Collections, but specifically try to not touch count due to issues with a particular conformance, your code isn't really generic at all. At that point you have to ask if it's even worth bothering to conform to Collection. Perhaps it's worth making your own, Collection-like protocol instead, whose guarantees are loose enough that they can encompass your type and actual Collections. For example:

protocol BigCollection {
  associatedtype IndexDistance // Let's say we wanted to bring this back.
}

extension RangeCombinations: BigCollection {
  // Now a valid conformance, meets all semantic guarantees 
  // of looser 'BigCollection' protocol.
}

struct BigCollectionAdapter<Base: Collection> {
  typealias IndexDistance = Int
  // Implement looser 'BigCollection' conformance in terms of 'Collection'.
}

Not every collection will go into the quintillions; the code works just fine if I got C(5, 2) == 10 elements. I don't want the extreme cases make the normal ones inconvenient.