Pitch: Sequence enhancements: chaining, repeating, batching

bitjammer · April 3, 2018, 9:57pm

Hey all,

I'm going through the list of common sequence APIs that I've had to implement multiple times in several projects and I'd like to pitch a couple of APIs on sequence, modulo naming.

Chaining

/// Presents a sequence of the elements in `firstSequence` followed by those in `secondSequence`.
firstSequence.followed(by: secondSequence)

Repeating

/// An infinite sequence of containing the elements of `sequence` repeated forever.
sequence.repeated(.forever)

/// An sequence containing the elements of `sequence` repeated twice.
sequence.repeated(.times(2))

Batching

/// The elements of `sequence` batched by subsequences of `count == 2`, except for remainder elements, which an have a `count < 2`.
sequence.batched(maxBatchSize: 2)

You've probably seen some of these in Python's itertools. These are the ones that I find come up the most often.

I have implementations for these that I can share if you'd like to try it out but they're pretty simple to implement; I'm sure a lot of you have implemented them yourselves.

nnnnnnnn · April 3, 2018, 10:30pm

Would love to have these additions built in! I recreate them too when I need them. The repeated one has come up a couple times before, plus batched in a sideways sort of way. Linking those threads for reference:

[cycle/repeated] Proposal: CollectionType.cycle property for an infinite sequence
[cycle/repeated again] [Starter Pitch] Introducing a `cycled` method to `Sequence`
[chunked/batched] What makes a good lazy procedure? And how to do it? - #2 by Ben_Cohen

bitjammer · April 4, 2018, 4:17pm

Thanks for the references! I've been away for some time. Any ideas why this hasn't gotten traction? Seems like a no-brainer to me.

anthonylatsis · April 4, 2018, 4:34pm

This one would be more ergonomic with an addition operator, don't you think? Or a variadic parameter instead, since operators aren't really welcome at such high hierarchy levels.

bitjammer · April 4, 2018, 5:39pm

Indeed,I think being able to concatenate more than two sequences together in a single use site is worth exploring.

Let's look at a hypothetical signature for the more limited case where you just want to join two sequences together:

extension Sequence {
    func followed<OtherSequence: Sequence>(by other: OtherSequence) -> ChainSequence<Self, OtherSequence> where Element == OtherSequence.Element {
        return ChainSequence(self, other)
    }
}

I would say the main benefit of this particular signature is that the first and second sequence can be completely different concrete sequences so long as they have the same element, like this:

Set([7,8,9]).followed(by: [1,2,3])

would work as well as:

[1,2,3].followed(by: [4,5,6])

So, for a variadic/multi-sequence signature, in order to maximize its benefit, we would want to be able to pass any group of sequences together so long as their Element is the same. The only way I know to do that today is to involve AnySequence in some way:

first.followed(by: AnySequence(second), AnySequence(third))

// or, perhaps something using `AnySequence` and `reduce`

Which I don't quite like as well as:

first
    .followed(by: second)
    .followed(by: third)

mpangburn · April 4, 2018, 11:41pm

As you've described, the most ergonomic solution won't be available until we have variadic generics. At the moment, the only workaround to the AnySequence problem you describe is to provide multiple implementations of followed(by:) for taking different quantities of sequences to chain, i.e. followed(by:), followed(by:_:), followed(by:_:_:), etc.

While functional, this is pretty ugly—I think the single followed(by:) is worth proposing now, and the variadic version is worth pursuing once Swift can support it more elegantly.

It's also worth noting that followed(by:) could be made variadic over a single OtherSequence where OtherSequence.Element == Element, which would (for example) allow a Set<Int> to be chained with several Array<Int> in a single call—though this may introduce confusion for users who do not understand why multiple different sequence types cannot be used.

Karl · April 5, 2018, 1:29am

We already have chaining:

let chain = [[1,2,3], [4,5,6]].joined()
print(Array(chain)) // [1, 2, 3, 4, 5, 6]

We could make it more direct and obvious, though. Currently you only see it when you have a collection of collections or something.

bitjammer · April 5, 2018, 2:34am

Hm, I think you're thinking of extension Sequence where Element: Sequence. I don't think that's quite the same thing and I don't think it's great API, not a very discoverable conditional extension. For the pattern [x, y].joined(), x and y need to be the same concrete type, so it's back to the same variadic/AnySequence problem as before.

allevato · April 5, 2018, 2:40am

Another downside to the [seq1, seq2, seq3].joined() syntax is that it requires heap allocation of an array—I imagine that wouldn't get optimized away. That ought to be completely unnecessary to lazily concatenate existing sequences.

Karl · April 5, 2018, 2:47am

It's not variadic generics, but generalised existentials. For example, this works:

let a = [AnyCollection<Any>([1,2,3]), 
         AnyCollection<Any>("hello".lazy.map { $0 })].joined()
print(Array(a)) // [1, 2, 3, "h", "e", "l", "l", "o"]

The problem is that Array needs one, single Element type and the existential Any<Collection where Element == X> is not a valid type for those contexts, so we need to box it.

The API is definitely not very discoverable, though, you're right. An ad-hoc join(with other:) would be great.

Also, to avoid heap allocations, you could use something like a Pair<T> which is just a wrapper around a tuple. Perhaps an ad-hoc method could use something like that internally.

bitjammer · April 5, 2018, 3:24am

Gotcha! Yep, the sketch I made a while back used a struct of the two, effectively your Pair<T>, although it was a little more specific to the chaining. I will try to dig up a gist so you all can get a feel for the ergonomics.

nnnnnnnn · April 5, 2018, 3:30am

I think you're absolutely right that the primary use is for joining just one with another. If we accept a variadic list, then it's not far off from writing:

// if they're the same type:
[first, second, third].joined()
// if they're different types:
[AnySequence(first), AnySequence(second), AnySequence(third)].joined()

(edit: I apparently missed that several other people made this point.)

The big advantage of adding a custom method and type for joining exactly two sequences/collections is that it can model the attributes of the underlying types. For example, if two ranges are joined, the result can still be a RandomAccessCollection. There's a Concatenation type in the Swift test suite that fills that role.

I double-checked, and @Erica_Sadun has a PR for the cycled proposal waiting, so that's somewhere in the pipeline.

bitjammer · April 5, 2018, 3:38am

Great, great point.

reitzig · April 9, 2018, 10:06am

Just a language nitpick:

Set([7,8,9]).followed(by: [1,2,3])

This doesn't make much sense, semantically. Sets are not sequences; the union of A and B is not accurately described by "A followed by B".

saagarjha · April 9, 2018, 10:07am

Well, Set is a Sequence in the sense that Set: Sequence.

reitzig · April 9, 2018, 10:11am

That's an unfortunate design choice, then, that won't make sense to many mathematically inclined readers.

But I'll play along. What's the result of Set([1,3,2]).followed(by: [5,3,0])? [1,3,2,5,0] or [1,2,5,3,0]? Or [0,1,2,3,5]? Imho, the only sanity-preserving answer is, "order doesn't matter for sets" -- yup, : Sequence doesn't make sense.

Letan · April 9, 2018, 11:50am

This is assuming that Set.followed(by:) returns a Set which I don't think it should or will? In which case the most obvious answer would be [1,3,2,5,3,0]

reitzig · April 9, 2018, 12:00pm

That wouldn't make any sense to me, both from an API standpoint (Sequence would not be a very useful return type) and from the semantic perspective ("joining" two sets shouldn't give me duplicates).

But well, someone apparently decided that, in Swift, sets are sequential. Woe them!

Letan · April 9, 2018, 12:04pm

If you need to join sets why not use union. followed(by:) reads as though it should just have the elements of one sequence 'follow' the other.

Avi · April 9, 2018, 4:23pm

I think you should check out Is a set a sequence? - #38 by jawbroken.