Add an `adjacentPairs` algorithm to Sequence

@nnnnnnnn I agree that AdjacentPairs would be a good candidate for conditional conformances to Collection, BidirectionalCollection, and RandomAccessCollection. But shouldn't the same be true for other types from the standard library such as Zip2Sequence, JoinedSequence, and DropFirstSequence?

1 Like

It appears that the motivation for adding this type is to support single-pass Sequences, yet experience shows that such sequences are very rare. Is it really worth introducing a whole new type to the standard library for that very rare case? There is a cost associated with that, and with managing the compatibility concerns (even with the preview library - that just foists the concerns down to app/library developers).

Bear in mind that other algorithms on Sequence (such as “split”) work by copying in to an Array and doing the operation on that instead. I’m not saying that’s an ideal solution, but splitting is probably used more often than adjacent pairs, and even that didn’t get its own type.

I would suggest adding the “zip” version to Collection, and add a version on Sequence which returns a Zip2Sequence<Array<Element>, ArraySlice<Element>>

Sounds great, Michael! Let me know if you run into any questions.

It should indeed! That will be possible once the language supports availability for conformances; right now adding those additional conformances would be an ABI breaking change.

Hm, I haven't thought about ABI breakage. But if that's a standard library feature we'd like to have in the long run, would it be possible to ship it in the Standard Library Preview package until we get availability of conformances? Or would that violate the "don't conform types you don't own to protocols you don't own" rule?

The policy for the preview package is that it should only contain things that are able to be landed in the standard library as well (i.e. if we know that there's a problem that prevents inclusion in the standard library, the feature also won't go into the preview package until it is resolved). The preview package should be source-stable, so we don't want to be in the situation where we put something there, and then discover a few months later that we have to make changes in order to add it to the standard library.

2 Likes

I would argue the other way; it’s a bug that split doesn’t properly support single-pass and/or infinite sequences because it copies to an Array.

@Karl I think the example of split in specific eagerly returns an array, falls into a rule of eager vs lazy in the stdlib where all things should be lazy if there isn't a big downside for doing that, in this case, the downside I think is the fact that split takes a closure which has to be stored in the specific lazy type ... so that's maybe the why it returns an eager array :)) I remember @Ben_Cohen mention this a long time ago here

Well, any version of split that supported single-pass sequences would require buffering anyway (with its Element type being that buffer). There is a bigger issue though:

The more you look at it, the more it becomes clear that the language's current definition of Sequence as single-pass is actually a design mistake, and that any single-pass data sources you have would be much better modeled as an Iterator. The current design exists to support for loops over a data source which destructively consumes its source as you iterate - something which I also think is of dubious utility, and better expressed by while-looping calls to iterator's mutating next() method.

There have been many, many, many, many discussions about this.

So bringing it back to adjacentPairs: for the reasons listed above, I really don't see the point in adding a new generic, lazy, wrapper type just to support this one kind of iteration on this one kind of rare sequence, which everybody seems to agree is a poor API in the first place!

Engineering is all about trade-offs. Consider how simple this pitch would be if it was limited to Collection: A one-line convenience function, easily back-deployable, with no new types or other massive complications. Is it really justified to add so much extra infrastructure just to support single-pass sequences? I think it isn't.

1 Like

As soon as I saw this I thought of n-grams. Is there any reason to restrict this to only pairs? It seems like the pitfalls would be the same regardless of the size of n.

2 Likes

Would the code I mention in another thread work for this? (And I just noticed that I messed up the spacing in the upload.)

Current implementation fetches first element from the base sequence during initialization. This can be changed to only start fetching from the base sequence when the first element is requested upstream. I believe this is better behaviour because it is 'more lazy'. It can be useful in practice when base sequence represents some non-free resource – for example a network call.

1 Like

Maybe the iterator type should be separated from the containing type. The iterator type technically should have a different "arity" in its generic parameter; if two Sequences use the same Iterator type, then the wrapping sequences should share the same wrapping iterator type. Right now, each Sequence has an independent wrapped-iterator type. I think the Standard library makes (made?) the same mistake in at least one of their wrappers.

I just tried implementing a generic

struct AdjacentPairs<Base> where Base: Sequence {
    fileprivate let base: Base
}
extension AdjacentPairs: Collection where Base: Collection {}

where the Sequence and Collection conformances use a different func makeIterator() implementation: more precisely, I would like to use an IndexingIterator for the Collection conformance, but I can't use that type of iterator for the Sequence conformance; see this gist.

However, when passing an instance of this type to a generic function that expects "only" a Sequence, the call to makeIterator() seems not to be dynamically dispatched as it uses the less specialized version:

private func sequence<T>(_ x: T) where T: Sequence {
    _ = x.makeIterator()
}

private func unidirectional<T>(_ x: T) where T: Collection {
    _ = x.makeIterator()
}

let x = AdjacentPairs(base: [1,2,3,4])
sequence(x) // prints "Sequence makeIterator"
unidirectional(x) // prints "Sequence makeIterator" (unexpected!)
_ = x.makeIterator() // prints "Collection makeIterator"

The only potential reason that comes to mind is implied conditional conformances, but I can't quite seem to make the connection and explain the behavior. Could you maybe shed some light on this? How would I go about using different makeIterator() implementations for different conditional conformances?

There is only one requirement named makeIterator(), which is the one defined on Sequence. A type can only conform to a protocol in a single way. That is, AdjacentPairs can only provide one implementation of makeIterator(), which is the one that applies unconditionally. You may overload that function as many times as you want on the concrete type, but only the one makeIterator() is the one that satisfies the protocol requirement.

That’s a good point, but it would require variadic generics (it’s the same issue as our lazy zip wrapper).

That’s another good reason why it’s better to hold off introducing another wrapper for this operation, IMO. Variadic generics are supposed to be in scope for Swift 6 IIRC.

The proposal has been implemented as a Swift package, with a PR opened on the Swift Evolution Staging repo: AdjacentPairs by mpangburn · Pull Request #2 · apple/swift-evolution-staging · GitHub

You can read the latest version of the proposal via the original Swift Evolution PR: Add an `adjacentPairs` algorithm to Sequence by mpangburn · Pull Request #901 · apple/swift-evolution · GitHub

Thanks @nnnnnnnn for your guidance!

6 Likes

@mpangburn Would you consider submitting this to GitHub - apple/swift-algorithms: Commonly used sequence and collection algorithms for Swift?

2 Likes

At the moment, I'm not quite certain as to how to gauge which Collection algorithms belong in the Standard Library versus Swift Algorithms. I think some guidance from @nnnnnnnn or from the Core Team would be valuable before I retract the existing proposal, since I was last directed to use Swift Evolution Staging.

2 Likes

Thanks all for your patience. Nate and I have been in touch regarding this proposal, and we've agreed the appropriate home for adjacentPairs is Swift Algorithms.

I've opened a patch here in support of this change: Introduce `adjacentPairs` by mpangburn · Pull Request #119 · apple/swift-algorithms · GitHub

3 Likes

I wonder if this can be done generally in terms of a fold or scan type operation. If so that might be an interesting thing to do both lazily as well as asynchronously.