Add Various Zip-Related Types and Operations

I've been thinking about this too. I've experimented with AnySequence, but there was a slight performance hit. UnfoldSequence would indeed be a viable alternative. :+1:

Please don't. We shouldn't add a type that competes with a built-in language feature. That will lead to impedance mismatches when you have one, but need the other.

I realize there are limitations to work around in not being able to extend structural types, but the answer should not be to add new named types, which would need to exist forever even after the language limitations are resolved.

3 Likes

I completely agree, hence:

Ideally this would be a variadic Tuple type. (0, true) would then be a literal for Tuple<Int, Bool>, the same way "string" is for String.

I definitely agree that these operations could be useful. zipLongest(_:_:fill:) (with good precedent in Python) and unzip seem perfectly named. However, like others, I'm not sure I see the need for innovating any new types if it can be done without; certainly, a namespace and types such as Pair seem like much too much.

1 Like

Love this proposal. The only problem I have with zip though is the lack of its variadic form, but that is way out of the scope of this proposal.

1 Like

I’m intrigued. What would be good use cases for a partial suffix view? I’m currently looking into using two separate UnfoldSequences as return types for zip(_:_:) and zipLongest(_:_:), but that would rule your suggestion out.

I don't have a concrete use case in mind, but the general idea is to preserve information. Specifically, the non-Optionality of the common prefix. In thinking about this further, the suffix could be represented as an enum with left and right cases (Either if we had it in the standard library). That would allow the elements of the suffix to also be non-Optional. User code would switch to determine the longer sequence and could have different logic depending on which was longer, using non-Optional elements if the suffix was iterated.

I'll put this in the back of my mind and try to think of something more concrete.

Sounds interesting! I'm particularly interested in how information preservation can be done efficiently (for its various use cases).

After doing some experimenting with Either, an Either-like tuple would work really well as a return type for a unified zip:

func zip <Sequence1: Sequence, Sequence2: Sequence> (_ sequence1: Sequence1, _ sequence2: Sequence2) -> (shortest:
    UnfoldSequence<(Sequence1.Element, Sequence2.Element), (Bool, Sequence1.Iterator, Sequence2.Iterator)>,
    longest: UnfoldSequence<(Sequence1.Element?, Sequence2.Element?), (Bool, Sequence1.Iterator,
    Sequence2.Iterator)>) {
  let state = (false, sequence1.makeIterator(), sequence2.makeIterator())

  return (
    shortest: sequence(state: state, next: {
      if $0.0 {
        return .none
      }

      guard let element1 = $0.1.next(), let element2 = $0.2.next() else {
        $0.0.toggle()

        return .none
      }

      return (element1, element2)
    }),
    longest: sequence(state: state, next: {
      if $0.0 {
        return .none
      }

      let element1 = $0.1.next()
      let element2 = $0.2.next()

      if element1 == nil && element2 == nil {
        $0.0.toggle()

        return .none
      }

      return (element1, element2)
    })
  )
}

let shortestZip = Array(zip([0, 1], [0, 1, 2]).shortest)

assert(shortestZip.count == 2)
assert(shortestZip[0] == (0, 0))
assert(shortestZip[1] == (1, 1))

let longestZip = Array(zip([0, 1], [0, 1, 2]).longest)

assert(longestZip.count == 3)
assert(longestZip[0] == (.some(0), .some(0)))
assert(longestZip[1] == (.some(1), .some(1)))
assert(longestZip[2] == (.none, .some(2)))

This is a rehash of a previous idea I had, but would break existing code that uses zip(_:_:). So this implementation is out of the question.

Zip2Sequence can also be extended with a longest view:

extension Zip2Sequence {
  var longest: UnfoldSequence<(Sequence1.Element?, Sequence2.Element?), (Bool, Sequence1.Iterator, Sequence2.Iterator)> {
    return sequence(state: (false, self._sequence1.makeIterator(), self._sequence2.makeIterator()), next: {
      if $0.0 {
        return .none
      }

      let element1 = $0.1.next()
      let element2 = $0.2.next()

      if element1 == nil && element2 == nil {
        $0.0.toggle()

        return .none
      }

      return (element1, element2)
    })
  }
}

Using it:

let shortestZip = Array(zip([0, 1], [0, 1, 2]))

assert(shortestZip.count == 2)
assert(shortestZip[0] == (0, 0))
assert(shortestZip[1] == (1, 1))

let longestZip = Array(zip([0, 1], [0, 1, 2]).longest)

assert(longestZip.count == 3)
assert(longestZip[0] == (.some(0), .some(0)))
assert(longestZip[1] == (.some(1), .some(1)))
assert(longestZip[2] == (.none, .some(2)))

Just for completion sake:

extension Zip2Sequence {
  public var shortest: Zip2Sequence {
    return self
  }
}

This does make the inner workings of the algorithm more clear, but makes this possible:

zip([0, 1], [0, 1, 2])).shortest.longest
zip([0, 1], [0, 1, 2])).shortest.shortest.longest
// Etc.

Which makes no sense.

I've pretty much exhausted all possible directions (other than zips that use right-alignment, but let's not open that can of worms) and am currently working on an updated proposal that includes all of your feedback.

The updated proposal will feature the following changes:

  1. Rename the title to “Improve Zip API”.
  2. Expand the Introduction section, briefly naming all new additions.
  3. Rewrite and add use cases in the Motivation section (@Erica_Sadun, @anandabits).
  4. Update the Proposed Solution, Detailed Design and Source Compatibility sections with the updated implementation (see below).
  5. Incorporate all alternatives from this thread (and previous discussions) in the Alternatives Considered section.

The updated implementation will feature the following modifications:

  1. zip(_:_:) and zipLongest(_:_:) will be (re)implemented using UnfoldSequences (@palimondo).
  2. For clarity and ease-of-use, typealiases called Zip2ShortestSequence<T, U> and Zip2LongestSequence<T, U> will be introduced pointing to UnfoldSequence<(T.Element, U.Element)>, (Bool, T.Iterator, U.Iterator)> and UnfoldSequence<(T.Element?, U.Element?)>, (Bool, T.Iterator, U.Iterator)>, respectively.
  3. Zip2Sequence<T, U> will be deprecated and redefined as a typealias pointing to Zip2ShortestSequence<T, U>.
  4. Sequence and LazySequence will be extended with eager and lazy variations of unzip(), respectively (@anandabits).
  5. Some types conforming to Collection will be extended with more performant variants of unzip() (@Ben_Cohen).

All, more intricate, alternatives should be postponed until the language has implemented both variadic generics and extendable non-nominal types, specifically tuples, so that both zip variants can become extensions on this new type. The global functions zip(_:_:) and zipLongest(_:_:) should then be deprecated.

The state for UnfoldSequence doesn’t need the Bool in the first position in the tuple.

You can verify yourself that once you return nil (or .none as you do), the branches that check for $0.0 being true will never be called. It’s just dead code.

If you want I can go over the proposal before you make it official. It might also be good to add benchmarks to verify the rewritten version performs on par with current implementation.

1 Like

Interesting, thanks.

(For posterity, it's because UnfoldSequence includes a Boolean check of its own. When the first occurrence of nil has been encountered, next won't be called again.)

:+1:

Considering the amount of work this proposal requires, I'm going to break it up in two parts, the first for zipLongest(_:_:) (including changes to zip(_:_:)) and the second for unzip(_:_:).

I read the discussion but am unsure of the state of the proposal (there doesn't seem to be a Github link to the latest version).

In any case, I think unzip should just be a reabstraction (no copying). Here's what I use:

The thing about UnfoldSequence is that it's easier for the optimiser to reason about the members of nominal types (e.g. for generic specialisation).

The current state of the proposal is as follows. The original proposal has been cut into two parts: the first focuses on zip using longest Sequence exhaustion and the second focuses on unzip.

I'm currently writing the proposal, implementation, documentation, tests and benchmarks for the first part of the proposal, tentatively titled "Add zipLongest(_:_:)". It contains no changes to zip(_:_:) and Zip2Sequence. It adds a new global function named zipLongest(_:_:) that returns a new type called Zip2LongestSequence (or ZipLongest2Sequence, still on the fence about that one).

The use of UnfoldSequence has been dropped for the following reasons: (1) to stay in line with the look and feel of the rest of the Standard Library, (2) for performance reasons (a new type has to be constructed every time zipLongest(_:_:) gets called) and (3) for type distinction reasons (a typealias doesn't guarantee that a type is unique).

This should be a straight forward addition. The exciting stuff will, hopefully, come in the future when we get variadic generics and extendable tuples.

The second part is tentatively titled "Add unzip()". In addition to the original extension on Sequence, a lazy variant will be added to LazySequenceProtocol. Some types conforming to Collection will be extended with more performant variants.

2 Likes

It is an interesting approach, but will probably be rejected on the basis that it's a reimplemention of a 2-tuple type. See @Ben_Cohen's thoughts about that here.

I don't see it: Unzip2Collection (or I guess it should really be lowered to Sequence) is fundamentally different from Pair.

For consistency, I believe zip and unzip should live in the same place, and since the existing zip function does not copy the contents of its constituent sequences, unzip shouldn't either.

I suppose a .lazy.unzip could work, but then I'd want us to move the current zip functionality there, too:

// Something like this, I think...
extension LazySequenceProtocol {
  func zip<S>(with other: S) where S: Sequence -> Zip2Sequence<Self, S> { ... }
}

[1, 2, 3].lazy.zip(with: ["a", "b", "c"]) // [(1, "a"), (2, "b"), (3, "c")]

Things should go into lazy when there is some risk or significant performance downside to them being lazy instead of eager. For example, lazy.map performs the mapping every time you iterate, which could be very costly on multiple iteration, and also captures and stores the mapping closure, which can be risky if it is stateful.

On the other hand, reversed on a bidirectional collection, for example, does not have a significant laziness downside: the reversing operation is only a simple adaptor over index manipulation (hopefully compiling away to nothing), so if all you want to do is reverse an array, there is no need to do it eagerly.

Since there is no significant downside to zip being lazy, there is no need for it to be restricted only to lazy sequences. It is already available for lazy sequences because those are also just sequences. So it shouldn't need "moving". It is already there. To hide it from users unless they type lazy first seems like harming usability purely for the sake of scratching a consistency itch – an insufficiently good reason, even without the issue that this would also be source breaking and therefore need to clear a very high bar.

Unzip, on the other hand, does have a performance downside to being lazy. If what you actually want is to create two arrays, you would have to take two passes over the collection instead of one. That's the argument from my perspective that it should be eager by default.

On the other hand, if all you want is to unzip one side of the pair and ignore the other (lazily or eagerly), that isn't unzip, it's just map { $0.0 }. This means that your gist can also be written with map, avoiding the creation of additional unnecessary types:

func unzip<C: LazyCollectionProtocol,T,U>(
  _ c: C
) -> (first: LazyMapCollection<C,T>, second: LazyMapCollection<C,U>) 
where C.Element == (T,U) {
    return (c.lazy.map({$0.0}),c.lazy.map({$0.1}))
}
4 Likes

I'm currently rounding up the implementation of this proposal for inclusion in Swift 5.0.

I've tried, to the best of my ability, to take all of your needs and feedback into consideration.

This post contains an overview of the latest public API and is my final request for comments from the Swift community before I submit the proposal.

API

Global Functions

zip(_:default:_:default:)

show / hide

Declaration

@inlinable public func zip <Sequence1: Sequence, Sequence2: Sequence> (
  _ sequence1: Sequence1, default element1: @autoclosure @escaping () -> Sequence1.Element? = nil,
  _ sequence2: Sequence2, default element2: @autoclosure @escaping () -> Sequence2.Element? = nil
) -> Zip2Sequence<Sequence1, Sequence2>

Examples

let xs = sequence(first: 0, next: { $0 + 1 }).prefix(3)
let ys = sequence(first: 0, next: { $0 + 1 }).prefix(2)

print(Array(zip(xs, ys)))
// Prints "[(0, 0), (1, 1)]"

print(Array(zip(xs, default: nil, ys, default: 42)))
// Prints "[(0, 0), (1, 1), (2, 42)]"

print(Array(zip(xs, default: nil, ys, default: { Int.random(in: 0...42) })))
// Prints "[(0, 0), (1, 1), (2, 7)]"

zip(_:default:_:default:)

show / hide

Declaration

@inlinable public func zip <Collection1: Collection, Collection2: Collection> (
  _ collection1: Collection1, default element1: @autoclosure @escaping () -> Collection1.Element? = nil,
  _ collection2: Collection2, default element2: @autoclosure @escaping () -> Collection2.Element? = nil
) -> Zip2Collection<Collection1, Collection2>

Examples

let xs = [0, 1, 2]
let ys = [0, 1]

print(Array(zip(xs, ys)))
// Prints "[(0, 0), (1, 1)]"

print(Array(zip(xs, default: nil, ys, default: 42)))
// Prints "[(0, 0), (1, 1), (2, 42)]"

print(Array(zip(xs, default: nil, ys, default: { Int.random(in: 0...42) })))
// Prints "[(0, 0), (1, 1), (2, 7)]"

zipLongest(_:_:)

show / hide

Declaration

@inlinable public func zipLongest <Sequence1: Sequence, Sequence2: Sequence> (
  _ sequence1: Sequence1,
  _ sequence2: Sequence2
) -> ZipLongest2Sequence<Sequence1, Sequence2>

Examples

let xs = sequence(first: 0, next: { $0 + 1 }).prefix(3)
let ys = sequence(first: 0, next: { $0 + 1 }).prefix(2)

print(Array(zipLongest(xs, ys)))
// Prints "[(Optional(0), Optional(0)), (Optional(1), Optional(1)), (Optional(2), nil)]"

zipLongest(_:_:)

show / hide

Declaration

@inlinable public func zipLongest <Collection1: Collection, Collection2: Collection> (
  _ collection1: Collection1,
  _ collection2: Collection2
) -> ZipLongest2Collection<Collection1, Collection2>

Examples

let xs = [0, 1, 2]
let ys = [0, 1]

print(Array(zipLongest(xs, ys)))
// Prints "[(Optional(0), Optional(0)), (Optional(1), Optional(1)), (Optional(2), nil)]"

extensions On Sequence

compactUnzip()

show / hide

Declaration

extension Sequence {
  @inlinable public func compactUnzip <Element1, Element2> () -> ([Element1], [Element2]) where
    Element == (Optional<Element1>, Optional<Element2>)
}

Example

let xs = sequence(first: 0, next: { $0 + 1 }).prefix(3)
let ys = sequence(first: 0, next: { $0 + 1 }).prefix(2)

print(zipLongest(xs, ys).compactUnzip())
// Prints "([0, 1, 2], [0, 1])"

unzip()

show / hide

Declaration

extension Sequence {
  @inlinable public func unzip <Element1, Element2> () -> ([Element1], [Element2]) where
    Element == (Element1, Element2)
}

Example

let xs = sequence(first: 0, next: { $0 + 1 }).prefix(3)
let ys = sequence(first: 0, next: { $0 + 1 }).prefix(2)

print(zip(xs, ys).unzip())
// Prints "([0, 1], [0, 1])"

extensions On Collection

compactUnzip()

show / hide

Declaration

extension Collection {
  @inlinable public func compactUnzip <Element1, Element2> () -> ([Element1], [Element2]) where
    Element == (Optional<Element1>, Optional<Element2>)
}

Example

let xs = [0, 1, 2]
let ys = [0, 1]

print(zipLongest(xs, ys).compactUnzip())
// Prints "([0, 1, 2], [0, 1])"

unzip()

show / hide

Declaration

extension Collection {
  @inlinable public func unzip <Element1, Element2> () -> ([Element1], [Element2]) where
    Element == (Element1, Element2)
}

Example

let xs = [0, 1, 2]
let ys = [0, 1]

print(zip(xs, ys).unzip())
// Prints "([0, 1], [0, 1])"

extensions On LazyCollectionProtocol

compactUnzip()

show / hide

Declaration

extension LazyCollectionProtocol {
  @inlinable public func compactUnzip <Element1, Element2> () -> (
    LazyMapCollection<LazyFilterCollection<LazyMapCollection<Elements, Element1?>>, Element1>,
    LazyMapCollection<LazyFilterCollection<LazyMapCollection<Elements, Element2?>>, Element2>
  ) where Elements.Element == (Optional<Element1>, Optional<Element2>)
}

Example

let xs = [0, 1, 2].lazy
let ys = [0, 1].lazy

print(zipLongest(xs, ys).compactUnzip())
// Prints "([0, 1, 2], [0, 1])"

unzip()

show / hide

Declaration

extension LazyCollectionProtocol {
  @inlinable public func unzip <Element1, Element2> () -> (
    LazyMapCollection<Elements, Element1>,
    LazyMapCollection<Elements, Element2>
  ) where Elements.Element == (Element1, Element2)
}

Example

let xs = [0, 1, 2].lazy
let ys = [0, 1].lazy

print(zip(xs, ys).unzip())
// Prints "([0, 1], [0, 1])"

Types

Zip2Sequence

show / hide

Declaration

@_fixed_layout public struct Zip2Sequence <Sequence1: Sequence, Sequence2: Sequence> {
  @inlinable public init (
    _ sequence1: Sequence1, default element1: @escaping () -> Sequence1.Element? = { nil },
    _ sequence2: Sequence2, default element2: @escaping () -> Sequence2.Element? = { nil }
  )
}

extension Zip2Sequence {
  @_fixed_layout public struct Iterator

  @inlinable public var underestimatedCount: Int { get }
}

extension Zip2Sequence.Iterator: IteratorProtocol {
  public typealias Element = (Sequence1.Element, Sequence2.Element)

  @inlinable public mutating func next () -> Element?
}

extension Zip2Sequence: Sequence {
  public typealias Element = (Sequence1.Element, Sequence2.Element)

  @inlinable public func makeIterator () -> Iterator
}

extension Zip2Sequence: LazySequenceProtocol where
    Sequence1: LazySequenceProtocol, Sequence2: LazySequenceProtocol

ZipLongest2Sequence

show / hide

Declaration

@_fixed_layout public struct ZipLongest2Sequence <Sequence1: Sequence, Sequence2: Sequence> {
  @inlinable public init (_ sequence1: Sequence1, _ sequence2: Sequence2)
}

extension ZipLongest2Sequence {
  @_fixed_layout public struct Iterator {}

  @inlinable public var underestimatedCount: Int { get }
}

extension ZipLongest2Sequence.Iterator: IteratorProtocol {
  public typealias Element = (Sequence1.Element?, Sequence2.Element?)

  @inlinable public mutating func next () -> Element?
}

extension ZipLongest2Sequence: Sequence {
  public typealias Element = (Sequence1.Element?, Sequence2.Element?)

  @inlinable public func makeIterator () -> Iterator
}

extension ZipLongest2Sequence: LazySequenceProtocol where
    Sequence1: LazySequenceProtocol, Sequence2: LazySequenceProtocol

Zip2Index

show / hide

Declaration

@_fixed_layout public struct Zip2Index <Collection1: Collection, Collection2: Collection>

extension Zip2Index: Comparable {
  @inlinable public static func < (lhs: Zip2Index, rhs: Zip2Index) -> Bool
}

Zip2Collection

show / hide

Declaration

@_fixed_layout public struct Zip2Collection <Collection1: Collection, Collection2: Collection> {
  @inlinable public init (
    _ collection1: Collection1, default element1: @escaping () -> Collection1.Element? = { nil },
    _ collection2: Collection2, default element2: @escaping () -> Collection2.Element? = { nil }
  )
}

extension Zip2Collection: Sequence {
  public typealias Element = (Collection1.Element, Collection2.Element)
  public typealias Iterator = Zip2Sequence<Collection1, Collection2>.Iterator

  @inlinable public func makeIterator () -> Iterator
}

extension Zip2Collection: Collection {
  public typealias Index = Zip2Index<Collection1, Collection2>
  public typealias SubSequence = Slice<Zip2Collection>

  @inlinable public var startIndex: Index { get }
  @inlinable public var endIndex: Index { get }

  @inlinable public subscript (position: Index) -> Element { get }

  @inlinable public func index (after i: Index) -> Index
}

extension Zip2Collection: BidirectionalCollection where
    Collection1: RandomAccessCollection, Collection2: RandomAccessCollection {
  @inlinable public func index (before i: Index) -> Index
}

extension Zip2Collection: RandomAccessCollection where
    Collection1: RandomAccessCollection, Collection2: RandomAccessCollection

extension Zip2Collection {
  @inlinable public func distance (from start: Index, to end: Index) -> Int
  @inlinable public func index (_ i: Index, offsetBy distance: Int) -> Index
  @inlinable public func index (_ i: Index, offsetBy distance: Int, limitedBy limit: Index) -> Index?
}

extension Zip2Collection: LazySequenceProtocol, LazyCollectionProtocol where
    Collection1: LazyCollectionProtocol, Collection2: LazyCollectionProtocol

ZipLongest2Collection

show / hide

Declaration

@_fixed_layout public struct ZipLongest2Collection <Collection1: Collection, Collection2: Collection> {
  @inlinable public init (_ collection1: Collection1, _ collection2: Collection2)
}

extension ZipLongest2Collection: Sequence {
  public typealias Element = (Collection1.Element?, Collection2.Element?)
  public typealias Iterator = ZipLongest2Sequence<Collection1, Collection2>.Iterator

  @inlinable public func makeIterator () -> Iterator
}

extension ZipLongest2Collection: Collection {
  public typealias Index = Zip2Index<Collection1, Collection2>

  @inlinable public var startIndex: Index { get }
  @inlinable public var endIndex: Index { get }

  @inlinable public subscript (index: Index) -> Element { get }

  @inlinable public func index (after index: Index) -> Index
}

extension ZipLongest2Collection: BidirectionalCollection where
    Collection1: RandomAccessCollection, Collection2: RandomAccessCollection {
  @inlinable public func index (before index: Index) -> Index
}

extension ZipLongest2Collection: RandomAccessCollection where
    Collection1: RandomAccessCollection, Collection2: RandomAccessCollection

extension ZipLongest2Collection {
  @inlinable public func distance (from start: Index, to end: Index) -> Int
  @inlinable public func index (_ i: Index, offsetBy distance: Int) -> Index
  @inlinable public func index (_ i: Index, offsetBy distance: Int, limitedBy limit: Index) -> Index?
}

extension ZipLongest2Collection: LazySequenceProtocol, LazyCollectionProtocol where
    Collection1: LazyCollectionProtocol, Collection2: LazyCollectionProtocol
1 Like