Generic Sequence Concatenation

I'm trying to come up with something that has as little overhead as possible to concatenate arbitrary sequences of similar types.

For example, I'd like to concatenate two sequences of Int; one is an Array the other is a Set.

Writing the code I wish I had would look something like this:

let first = [1, 2, 3]
let second: Set = [4, 5, 6]

for el in first.concatenate(second) {
  print(el) // Prints 1, 2, 3, 4, 5, 6 on each successive loop
}

Where concatenate is the function I wish I had.

I have come up with:

extension Sequence {
  func concatenate<S: Sequence>(_ sequence: S) -> AnySequence<Self.Element> where Self.Element == S.Element {
    var (first, second) = (self.makeIterator(), sequence.makeIterator())
    return AnySequence(AnyIterator { first.next() ?? second.next() })
  }
}

My questions are:

  1. I would not think that this creates any extra copies of the Array or Set am I right?
  2. Are there any better suggestions (e.g., something better than AnySequence return, a better where clause, etc...)?
  3. Is there something in the standard library that already covers this and I missed it?

As far as I know, stdlib has joined() that is available on sequences of sequences.

You could thus write:

let first = [1, 2, 3]
let second: Set = [4, 5, 6]

for el in [AnySequence(first), AnySequence(second)].joined() {
  print(el) // Prints 1, 2, 3, 4, 5, 6 on each successive loop
}

But I guess this is not exactly what you're after. And I'm not sure about the performance cost.

Thank you for the suggestion @gwendal.roue.

My concern is that it looks like it is explicitly making a new Array/Collection. I'm worried about the overhead of that.

@Erica_Sadun has a PR open with a proposal that includes this functionality: swift-evolution/XXXX-cycled.md at 2520b5b3333fd74411a099ba24bbf9dc31af27af · apple/swift-evolution · GitHub

She creates a specialized ConcatenatedSequence generic over the two concatenated sequence types, which is the most performant option here. However, rather than conform ConcatenatedSequence to IteratorProtocol and store the base iterators, it's preferable to store the original sequences and have an inner Iterator type to make the sequence multi-pass when its underlying sequences are.

4 Likes

The trouble with this and @gwendal.roue's joined suggestion is they will have poor performance, due to the use of type erasure (unless the optimizer pulls a big rabbit out of its hat).

The good news is you can push the generic approach a bit further and eliminate the type erasure:

public struct Spliced<S1: Sequence, S2: Sequence> where S1.Element == S2.Element {
  private let _s1: S1, _s2: S2
  fileprivate init(_ s1: S1, _ s2: S2) {
    _s1 = s1; _s2 = s2
  }
}

extension Spliced {
  public struct Iterator {
    private var _i1: S1.Iterator, _i2: S2.Iterator
    fileprivate init(_ s1: S1, _ s2: S2) {
      _i1 = s1.makeIterator()
      _i2 = s2.makeIterator()
    }
  }
}

extension Spliced.Iterator: IteratorProtocol {
  public typealias Element = S1.Element
  public mutating func next() -> Element? {
    return _i1.next() ?? _i2.next()
  }
}

extension Spliced: Sequence {
  public typealias Element = S1.Element
  public func makeIterator() -> Iterator {
    return Iterator(_s1, _s2)
  }
}

public func splice<S1: Sequence, S2: Sequence>(
  _ s1: S1, _ s2: S2
) -> Spliced<S1, S2> where S1.Element == S2.Element {
  return Spliced(s1, s2)
}

This ought to produce pretty optimal code (I'm a bit nervous about the implementation of next function, a stateful boolean indicating the first entry was already consumed might be faster if S1's next implementation has some overhead).

This allows heterogenous pairings:

let a: Array = [1,2,3]
let s: Set = [4,5,6]

for x in splice(a,s) {
  print(x)
}

NB that Set is unordered. You will get random ordering of the second half on each run due to Swift's randomized hash seeding.

You can use the same trick as with zip to splice multiple entries:

let r = 7...9

// prints 45
print(splice(a, splice(s, r)).reduce(0, +))
12 Likes