Add an `adjacentPairs` algorithm to Sequence

Oh, and we also have to check if @mpangburn thinks that the looping version is a useful addition to his pitch :-)

@Loooop @gwendal.roue I can see the utility of a version containing a pair of the last and first elements for working with polygons; do you have any other use cases in mind?
Without too much thought, the number of occasions on which there exists a meaningful relationship between the last and first elements of a sequence seems much fewer to me.

That's a very natural question. I personally only see graphical uses. Cycling data structures are not so common, are they? I'm not familiar enough with rings et al. May I summon the mathematical culture of @yxckjhasdkjh for more ideas?

Rings don't really have anything to do with cyclical structure, the name is an oddity as are most in mathematics.

Some ideas where "cycles" appear in mathematics, though:

  • modular arithmetic: this includes clock time and weekdays—24 hours from now the clock will show the same number
  • rotations and reflections: if you rotate something 360 degrees, you're back where you started; if you take the reflection of the reflection, you get the original, etc. All this is closely related to symmetries of objects.
  • permutation groups: Let's say you have three cards. If you move the first card to the second position, the second to the third and the third to the first, then repeat that same exchange twice, you're back where you started.
  • complex roots of unity: best shown by this image
  • graph theory: Cycles are pretty important here for a number of reasons (e.g. trees are connected acyclic graphs, which means, if you find a cycle, it's not a tree)

Of course there's also periodic functions like sine, cosine, but those are probably less relevant since they are not discrete .

I'm not sure if that helps though. :)

edit: Maybe someone more versed in computer science should chime in, though. There are certainly cyclic data structures, but I couldn't say too much about how often and where they are commonly used. Off the top of my head, maybe in relation to process scheduling or something?

2 Likes

I've opened a PR on apple/swift-evolution with the proposal along with a link to the corresponding implementation PR on apple/swift.

Thanks to all for your feedback during the pitch!

Thanks for writing it up. You probably don't need to address this in the proposal, but just a heads up that the discussion around opaque result types is probably relevant here, since AdjacentPairsSequence seems like a good candidate for hiding in this way.

4 Likes

Thanks for creating this implementation, @mpangburn! AdjacentPairsSequence is the kind of type that should conditionally conform to each of the Collection, BidirectionalCollection, and RandomAccessCollection protocols when its Base type does. (And should therefore probably just be called AdjacentPairs.)

If you'd like to discuss the design or implementation of that, feel free to do that here or in the standard library development forum channel.

1 Like

Thanks for the guidance, @nnnnnnnn. Now that the Standard Library Preview is live, I'd like to get this cleaned up to push it forward again.

I'm planning on following your package organization precedent via SE-0270; if you have any suggestions for following that process, I'd appreciate them.

2 Likes

@nnnnnnnn I agree that AdjacentPairs would be a good candidate for conditional conformances to Collection, BidirectionalCollection, and RandomAccessCollection. But shouldn't the same be true for other types from the standard library such as Zip2Sequence, JoinedSequence, and DropFirstSequence?

1 Like

It appears that the motivation for adding this type is to support single-pass Sequences, yet experience shows that such sequences are very rare. Is it really worth introducing a whole new type to the standard library for that very rare case? There is a cost associated with that, and with managing the compatibility concerns (even with the preview library - that just foists the concerns down to app/library developers).

Bear in mind that other algorithms on Sequence (such as “split”) work by copying in to an Array and doing the operation on that instead. I’m not saying that’s an ideal solution, but splitting is probably used more often than adjacent pairs, and even that didn’t get its own type.

I would suggest adding the “zip” version to Collection, and add a version on Sequence which returns a Zip2Sequence<Array<Element>, ArraySlice<Element>>

Sounds great, Michael! Let me know if you run into any questions.

It should indeed! That will be possible once the language supports availability for conformances; right now adding those additional conformances would be an ABI breaking change.

Hm, I haven't thought about ABI breakage. But if that's a standard library feature we'd like to have in the long run, would it be possible to ship it in the Standard Library Preview package until we get availability of conformances? Or would that violate the "don't conform types you don't own to protocols you don't own" rule?

The policy for the preview package is that it should only contain things that are able to be landed in the standard library as well (i.e. if we know that there's a problem that prevents inclusion in the standard library, the feature also won't go into the preview package until it is resolved). The preview package should be source-stable, so we don't want to be in the situation where we put something there, and then discover a few months later that we have to make changes in order to add it to the standard library.

2 Likes

I would argue the other way; it’s a bug that split doesn’t properly support single-pass and/or infinite sequences because it copies to an Array.

@Karl I think the example of split in specific eagerly returns an array, falls into a rule of eager vs lazy in the stdlib where all things should be lazy if there isn't a big downside for doing that, in this case, the downside I think is the fact that split takes a closure which has to be stored in the specific lazy type ... so that's maybe the why it returns an eager array :)) I remember @Ben_Cohen mention this a long time ago here

Well, any version of split that supported single-pass sequences would require buffering anyway (with its Element type being that buffer). There is a bigger issue though:

The more you look at it, the more it becomes clear that the language's current definition of Sequence as single-pass is actually a design mistake, and that any single-pass data sources you have would be much better modeled as an Iterator. The current design exists to support for loops over a data source which destructively consumes its source as you iterate - something which I also think is of dubious utility, and better expressed by while-looping calls to iterator's mutating next() method.

There have been many, many, many, many discussions about this.

So bringing it back to adjacentPairs: for the reasons listed above, I really don't see the point in adding a new generic, lazy, wrapper type just to support this one kind of iteration on this one kind of rare sequence, which everybody seems to agree is a poor API in the first place!

Engineering is all about trade-offs. Consider how simple this pitch would be if it was limited to Collection: A one-line convenience function, easily back-deployable, with no new types or other massive complications. Is it really justified to add so much extra infrastructure just to support single-pass sequences? I think it isn't.

1 Like

As soon as I saw this I thought of n-grams. Is there any reason to restrict this to only pairs? It seems like the pitfalls would be the same regardless of the size of n.

2 Likes

Would the code I mention in another thread work for this? (And I just noticed that I messed up the spacing in the upload.)

Current implementation fetches first element from the base sequence during initialization. This can be changed to only start fetching from the base sequence when the first element is requested upstream. I believe this is better behaviour because it is 'more lazy'. It can be useful in practice when base sequence represents some non-free resource – for example a network call.

1 Like

Maybe the iterator type should be separated from the containing type. The iterator type technically should have a different "arity" in its generic parameter; if two Sequences use the same Iterator type, then the wrapping sequences should share the same wrapping iterator type. Right now, each Sequence has an independent wrapped-iterator type. I think the Standard library makes (made?) the same mistake in at least one of their wrappers.