Implementation of Collection Versions of Zip

Or try something like the design I suggested that tracks the starting index of the longer collection's unmatched suffix.

The current index of the shortest Collection starts decrementing once the distance from the startIndex to the current index of the longest Collection is less than or equal to the distance from the startIndex to the current index of the longest Collection.

This should get handled by the final case in the switch posted above.

The other issue here is one of performance — those calls to distance(from:to:) are O(n) on bidirectional collections, which turn index(before:) into an O(n) operation, when it really should be constant time. That means that a variety of collection operations could end up being quadratic instead of linear.

Have you considered skipping bidirectional conformance, and just making the zip collections random access when the underlying collections are random access?

Exactly! distance is linear for BidirectionalCollection: https://github.com/apple/swift/blob/master/stdlib/public/core/BidirectionalCollection.swift#L293-L318

Yes, I know. The conditional conformance syntax might be deceiving here.

This is how it's currently implemented:

extension ZipLongest2Collection: BidirectionalCollection, RandomAccessCollection
    where Collection1: RandomAccessCollection, Collection2: RandomAccessCollection {
  <#...#>
}

All intermediate protocols must be specified, separated by commas, when using conditional conformance(s). Otherwise the compiler will complain.

Here's another example from the Standard Library:

extension ClosedRange: Collection, BidirectionalCollection, RandomAccessCollection
    where Bound: Strideable, Bound.Stride: SignedInteger {
  <#...#>
}

My apologies, I did miss that! Sounds like the right approach in that case.

1 Like

Thoughts and further points for discussion:

  • Is this the right model for Index? Does it scale to higher arities?

  • Are the conformances to LazySequenceProtocol and LazyCollectionProtocol correctly implemented?

  • What are the possibilities regarding conformance to RangeReplaceableCollection and MutableCollection?

  • Are there any additional performance gains to be made? Should we override certain default implementations?

  • On the topic of naming the global functions.

    The consensus in the pitch phase was that longest should be part of the name. It is highly descriptive and has good precedence in Python.

    But which ones of these two are more Swifty?

    1. zipLongest(_:_:):

      zipLongest([0, 1, 2], [0])
      
    2. zip(longest:_:):

      zip(longest: [0, 1, 2], [0])
      

    Taking existing symbols from the Standard Library into account, e.g., prefix(_:), prefix(upto:), prefix(through:) and prefix(while:), the second one seems to fall more into line with current naming conventions.

  • Now for the fun part. Which ZipLongest2Collection-specific members should we add? Given:

    let xs = zipLongest([0, 1, 2], [0])
    
    1. A view on the shortest prefix:

      print(Array(xs.prefix))
      // Prints "[(.some(0), .some(0))]"
      
      print(xs.prefix.count)
      // Prints "1"
      

      (Up for bikeshedding.)

    2. A view on the longest suffix:

      print(Array(xs.suffix))
      // Prints "[(.some(1), .none), (.some(2), .none)]"
      
      print(xs.suffix.count)
      // Prints "2"
      

      (Up for bikeshedding.)

    3. Convenience subscript(s) using one (or two) Int offset(s):

      print(xs[0]))
      // Prints "(.some(0), .some(0))"
      
      print(xs[1, 0]))
      // Prints "(.some(1), .some(0))"
      

      In my view, we should provide at least the first version since Index.init is internal.

      It makes sense to put these on Collection. (Since the existing subscript(_:) is O(n) for Collections too.)

    4. (This one might be controversial. A binary version of map(_:) named map(_:_:). It would look something like this:

      let f: (Int?) -> Int = { $0 == .none ? 0 : $0! + 1 }
      let g: (Int?) -> Int = { $0 == .none ? 0 : $0! + 2 }
      
      print(xs.map(f, g)))
      // Prints "[(1, 2), (2, 0), (3, 0)]"
      

      This method will increase expressivity and composibility.)

  • What can we implement similarly in Zip2Collection? How do we make sure the API between both variants is uniform?

Things to do on my end:

  • Finalise the implementation (feedback from the community on the points above would be very helpful).
  • Increase test coverage.
  • Create benchmarks.
  • Write missing documentation.
  • When everything is "done", fold it back into apple/swift#18769.
  • Finalise proposal.

It might be a good practice in general to reverse the order of the conformances to make the intent more clear, e.g., turn:

extension ClosedRange: Collection, BidirectionalCollection, RandomAccessCollection
    where Bound: Strideable, Bound.Stride: SignedInteger {
  <#...#>
}

into:

extension ClosedRange: RandomAccessCollection, BidirectionalCollection, Collection
    where Bound: Strideable, Bound.Stride: SignedInteger {
  <#...#>
}

This version implies to me that the first sequence/collection is or must be the longest of two.

Is this deliberately mismatching the indices?

Good point. Thought about that as well. In this style, something like zipLongest(sequences:_:) / zipLongest(collections:_:) would then probably be better. But then we're better off with zipLongest(_:_:).

The Index would be invalid in the context of Comparable, yes. But since it returns a value we should be safe.

I think (1) is better. Like @jawbroken, I agree that (2) looks like longest-ness is something exclusive to the first argument, although it isn't.

Would names like "zipFully" or "zipToLongest" be better? Or would those only work if we rename the current "zip" to something like "zipToShortest"?

zip(_:_:) can't be renamed. In most other languages the semantics are identical, breaking expectations. An even bigger issue is that it will break existing code. Additive features should be okay.

For instance, I haven't put much thought into zip(_:_:) yet, but one idea I have is the addition of a default or defaults parameter:

print(Array(zip([0, 1, 2], [0], default: nil, 0)))
/// Prints "[(0, 0), (1, 0), (2, 0)]"

This should only be a few lines of code extra for both the Sequence and Collection versions of zip(_:_:) and doesn't break existing code.

Back to naming, zipToLongest(_:_:) could also work, but then zipLongest(_:_:) is more succinct. zip(toLongestOf:or:) / zip(longestOf:or:) is also a possibility that gets around the ambiguities of zip(longest:_:).

I'm looking into conforming Zip2Collection to both ExpressibleByArrayLiteral and ExpressibleByDictionaryLiteral:

extension Zip2Collection: ExpressibleByArrayLiteral where
    Collection1: RangeReplaceableCollection,
    Collection2: RangeReplaceableCollection {
  public typealias ArrayLiteralElement = Element

  public init (arrayLiteral: Element...) {
    self.init()

    let arrayLiteralCount = arrayLiteral.count

    _collection1.reserveCapacity(arrayLiteralCount)
    _collection2.reserveCapacity(arrayLiteralCount)

    for (element1, element2) in arrayLiteral {
      _collection1.append(element1)
      _collection2.append(element2)
    }
  }
}
extension Zip2Collection: ExpressibleByDictionaryLiteral where
    Collection1: RangeReplaceableCollection,
    Collection2: RangeReplaceableCollection {
  public typealias Key = Collection1.Element
  public typealias Value = Collection2.Element

  public init (dictionaryLiteral elements: Element...) {
    self.init()

    let elementsCount = elements.count

    _collection1.reserveCapacity(elementsCount)
    _collection2.reserveCapacity(elementsCount)

    for (element1, element2) in elements {
      _collection1.append(element1)
      _collection2.append(element2)
    }
  }
}

If anybody sees any room for improvement here, let me know. :+1:

We don't typically have those conformances for this kind of helper type — they're only produced as the result of some other operation. For example, Zip2Sequence doesn't have any public initializers at all. You can only create one by using the zip(_:_:) function.

Is there an official consensus from the core team on this subject?

Zip2Sequence's initialiser is public with underscored parameter labels:

print(Zip2Sequence<[Int], [Int]>(_sequence1: [0, 1, 2], _sequence2: [0, 1, 2]))
// Prints "Zip2Sequence<Array<Int>, Array<Int>>(_sequence1: [0, 1, 2], _sequence2: [0, 1, 2])"

But I understand what you're saying. :+1:

What's the consensus on adding support for higher arities?

Each additional arity will add four more global functions and five more types.

This is pretty much a hack that predates @usableFromInline. It might go away.

1 Like

What's more, I don't really see the point of adding them. The point of zipping is to construct the tuple sequence dynamically from two independent sequences. What would be the benefit of creating a zipped collection from a literal at compile time, instead of just creating an array or key/value pair literal?

Eh, it's marked // @testable too. But given how trivial the zip function is, I don't see that it really needs to be.

edit: yeah, that @testable is bogus. I put up a PR to internalize the init.

My main line of reasoning here was to provide an alternative shorthand for creating a default value for when Zip2Collection is Optional:

let xs: Zip2Collection<[Int], [Int]> = nil ?? [(42, 42)]

Or empty:

let xs = { $0.isEmpty ? [(42, 42)] : $0 }(zip([Int](), [Int]()))