I've been thinking about this too. I've experimented with AnySequence
, but there was a slight performance hit. UnfoldSequence
would indeed be a viable alternative.
Please don't. We shouldn't add a type that competes with a built-in language feature. That will lead to impedance mismatches when you have one, but need the other.
I realize there are limitations to work around in not being able to extend structural types, but the answer should not be to add new named types, which would need to exist forever even after the language limitations are resolved.
I completely agree, hence:
Ideally this would be a variadic Tuple
type. (0, true)
would then be a literal for Tuple<Int, Bool>
, the same way "string"
is for String
.
I definitely agree that these operations could be useful. zipLongest(_:_:fill:)
(with good precedent in Python) and unzip
seem perfectly named. However, like others, I'm not sure I see the need for innovating any new types if it can be done without; certainly, a namespace and types such as Pair
seem like much too much.
Love this proposal. The only problem I have with zip
though is the lack of its variadic form, but that is way out of the scope of this proposal.
I’m intrigued. What would be good use cases for a partial suffix view? I’m currently looking into using two separate UnfoldSequence
s as return
types for zip(_:_:)
and zipLongest(_:_:)
, but that would rule your suggestion out.
I don't have a concrete use case in mind, but the general idea is to preserve information. Specifically, the non-Optionality of the common prefix. In thinking about this further, the suffix could be represented as an enum with left
and right
cases (Either
if we had it in the standard library). That would allow the elements of the suffix to also be non-Optional. User code would switch to determine the longer sequence and could have different logic depending on which was longer, using non-Optional elements if the suffix was iterated.
I'll put this in the back of my mind and try to think of something more concrete.
Sounds interesting! I'm particularly interested in how information preservation can be done efficiently (for its various use cases).
After doing some experimenting with Either
, an Either
-like tuple would work really well as a return
type for a unified zip
:
func zip <Sequence1: Sequence, Sequence2: Sequence> (_ sequence1: Sequence1, _ sequence2: Sequence2) -> (shortest:
UnfoldSequence<(Sequence1.Element, Sequence2.Element), (Bool, Sequence1.Iterator, Sequence2.Iterator)>,
longest: UnfoldSequence<(Sequence1.Element?, Sequence2.Element?), (Bool, Sequence1.Iterator,
Sequence2.Iterator)>) {
let state = (false, sequence1.makeIterator(), sequence2.makeIterator())
return (
shortest: sequence(state: state, next: {
if $0.0 {
return .none
}
guard let element1 = $0.1.next(), let element2 = $0.2.next() else {
$0.0.toggle()
return .none
}
return (element1, element2)
}),
longest: sequence(state: state, next: {
if $0.0 {
return .none
}
let element1 = $0.1.next()
let element2 = $0.2.next()
if element1 == nil && element2 == nil {
$0.0.toggle()
return .none
}
return (element1, element2)
})
)
}
let shortestZip = Array(zip([0, 1], [0, 1, 2]).shortest)
assert(shortestZip.count == 2)
assert(shortestZip[0] == (0, 0))
assert(shortestZip[1] == (1, 1))
let longestZip = Array(zip([0, 1], [0, 1, 2]).longest)
assert(longestZip.count == 3)
assert(longestZip[0] == (.some(0), .some(0)))
assert(longestZip[1] == (.some(1), .some(1)))
assert(longestZip[2] == (.none, .some(2)))
This is a rehash of a previous idea I had, but would break existing code that uses zip(_:_:)
. So this implementation is out of the question.
Zip2Sequence
can also be extended with a longest
view:
extension Zip2Sequence {
var longest: UnfoldSequence<(Sequence1.Element?, Sequence2.Element?), (Bool, Sequence1.Iterator, Sequence2.Iterator)> {
return sequence(state: (false, self._sequence1.makeIterator(), self._sequence2.makeIterator()), next: {
if $0.0 {
return .none
}
let element1 = $0.1.next()
let element2 = $0.2.next()
if element1 == nil && element2 == nil {
$0.0.toggle()
return .none
}
return (element1, element2)
})
}
}
Using it:
let shortestZip = Array(zip([0, 1], [0, 1, 2]))
assert(shortestZip.count == 2)
assert(shortestZip[0] == (0, 0))
assert(shortestZip[1] == (1, 1))
let longestZip = Array(zip([0, 1], [0, 1, 2]).longest)
assert(longestZip.count == 3)
assert(longestZip[0] == (.some(0), .some(0)))
assert(longestZip[1] == (.some(1), .some(1)))
assert(longestZip[2] == (.none, .some(2)))
Just for completion sake:
extension Zip2Sequence {
public var shortest: Zip2Sequence {
return self
}
}
This does make the inner workings of the algorithm more clear, but makes this possible:
zip([0, 1], [0, 1, 2])).shortest.longest
zip([0, 1], [0, 1, 2])).shortest.shortest.longest
// Etc.
Which makes no sense.
I've pretty much exhausted all possible directions (other than zips that use right-alignment, but let's not open that can of worms) and am currently working on an updated proposal that includes all of your feedback.
The updated proposal will feature the following changes:
Rename the title to “Improve Zip API”.Expand the Introduction section, briefly naming all new additions.Rewrite and add use cases in the Motivation section (@Erica_Sadun, @anandabits).Update the Proposed Solution, Detailed Design and Source Compatibility sections with the updated implementation (see below).Incorporate all alternatives from this thread (and previous discussions) in the Alternatives Considered section.
The updated implementation will feature the following modifications:
zip(_:_:)
andzipLongest(_:_:)
will be (re)implemented usingUnfoldSequence
s (@palimondo).For clarity and ease-of-use,typealias
es calledZip2ShortestSequence<T, U>
andZip2LongestSequence<T, U>
will be introduced pointing toUnfoldSequence<(T.Element, U.Element)>, (Bool, T.Iterator, U.Iterator)>
andUnfoldSequence<(T.Element?, U.Element?)>, (Bool, T.Iterator, U.Iterator)>
, respectively.Zip2Sequence<T, U>
will be deprecated and redefined as atypealias
pointing toZip2ShortestSequence<T, U>
.Sequence
andLazySequence
will be extended with eager and lazy variations ofunzip()
, respectively (@anandabits).Some types conforming toCollection
will be extended with more performant variants ofunzip()
(@Ben_Cohen).
All, more intricate, alternatives should be postponed until the language has implemented both variadic generics and extendable non-nominal types, specifically tuples, so that both zip variants can become extension
s on this new type. The global functions zip(_:_:)
and zipLongest(_:_:)
should then be deprecated.
The state for UnfoldSequence
doesn’t need the Bool
in the first position in the tuple.
You can verify yourself that once you return nil
(or .none
as you do), the branches that check for $0.0
being true will never be called. It’s just dead code.
If you want I can go over the proposal before you make it official. It might also be good to add benchmarks to verify the rewritten version performs on par with current implementation.
Interesting, thanks.
(For posterity, it's because UnfoldSequence
includes a Boolean check of its own. When the first occurrence of nil
has been encountered, next
won't be called again.)
Considering the amount of work this proposal requires, I'm going to break it up in two parts, the first for zipLongest(_:_:)
(including changes to zip(_:_:)
) and the second for unzip(_:_:)
.
I read the discussion but am unsure of the state of the proposal (there doesn't seem to be a Github link to the latest version).
In any case, I think unzip
should just be a reabstraction (no copying). Here's what I use:
The thing about UnfoldSequence
is that it's easier for the optimiser to reason about the members of nominal types (e.g. for generic specialisation).
The current state of the proposal is as follows. The original proposal has been cut into two parts: the first focuses on zip using longest Sequence
exhaustion and the second focuses on unzip.
I'm currently writing the proposal, implementation, documentation, tests and benchmarks for the first part of the proposal, tentatively titled "Add zipLongest(_:_:)
". It contains no changes to zip(_:_:)
and Zip2Sequence
. It adds a new global function named zipLongest(_:_:)
that return
s a new type called Zip2LongestSequence
(or ZipLongest2Sequence
, still on the fence about that one).
The use of UnfoldSequence
has been dropped for the following reasons: (1) to stay in line with the look and feel of the rest of the Standard Library, (2) for performance reasons (a new type has to be constructed every time zipLongest(_:_:)
gets called) and (3) for type distinction reasons (a typealias
doesn't guarantee that a type is
unique).
This should be a straight forward addition. The exciting stuff will, hopefully, come in the future when we get variadic generics and extendable tuples.
The second part is tentatively titled "Add unzip()
". In addition to the original extension
on Sequence
, a lazy variant will be added to LazySequenceProtocol
. Some types conforming to Collection
will be extended with more performant variants.
It is an interesting approach, but will probably be rejected on the basis that it's a reimplemention of a 2-tuple type. See @Ben_Cohen's thoughts about that here.
I don't see it: Unzip2Collection (or I guess it should really be lowered to Sequence) is fundamentally different from Pair.
For consistency, I believe zip
and unzip
should live in the same place, and since the existing zip
function does not copy the contents of its constituent sequences, unzip
shouldn't either.
I suppose a .lazy.unzip
could work, but then I'd want us to move the current zip
functionality there, too:
// Something like this, I think...
extension LazySequenceProtocol {
func zip<S>(with other: S) where S: Sequence -> Zip2Sequence<Self, S> { ... }
}
[1, 2, 3].lazy.zip(with: ["a", "b", "c"]) // [(1, "a"), (2, "b"), (3, "c")]
Things should go into lazy
when there is some risk or significant performance downside to them being lazy instead of eager. For example, lazy.map
performs the mapping every time you iterate, which could be very costly on multiple iteration, and also captures and stores the mapping closure, which can be risky if it is stateful.
On the other hand, reversed
on a bidirectional collection, for example, does not have a significant laziness downside: the reversing operation is only a simple adaptor over index manipulation (hopefully compiling away to nothing), so if all you want to do is reverse an array, there is no need to do it eagerly.
Since there is no significant downside to zip
being lazy, there is no need for it to be restricted only to lazy sequences. It is already available for lazy sequences because those are also just sequences. So it shouldn't need "moving". It is already there. To hide it from users unless they type lazy
first seems like harming usability purely for the sake of scratching a consistency itch – an insufficiently good reason, even without the issue that this would also be source breaking and therefore need to clear a very high bar.
Unzip, on the other hand, does have a performance downside to being lazy. If what you actually want is to create two arrays, you would have to take two passes over the collection instead of one. That's the argument from my perspective that it should be eager by default.
On the other hand, if all you want is to unzip one side of the pair and ignore the other (lazily or eagerly), that isn't unzip, it's just map { $0.0 }
. This means that your gist can also be written with map
, avoiding the creation of additional unnecessary types:
func unzip<C: LazyCollectionProtocol,T,U>(
_ c: C
) -> (first: LazyMapCollection<C,T>, second: LazyMapCollection<C,U>)
where C.Element == (T,U) {
return (c.lazy.map({$0.0}),c.lazy.map({$0.1}))
}
I'm currently rounding up the implementation of this proposal for inclusion in Swift 5.0.
I've tried, to the best of my ability, to take all of your needs and feedback into consideration.
This post contains an overview of the latest public API and is my final request for comments from the Swift community before I submit the proposal.
API
Global Functions
zip(_:default:_:default:)
show / hide
Declaration
@inlinable public func zip <Sequence1: Sequence, Sequence2: Sequence> (
_ sequence1: Sequence1, default element1: @autoclosure @escaping () -> Sequence1.Element? = nil,
_ sequence2: Sequence2, default element2: @autoclosure @escaping () -> Sequence2.Element? = nil
) -> Zip2Sequence<Sequence1, Sequence2>
Examples
let xs = sequence(first: 0, next: { $0 + 1 }).prefix(3)
let ys = sequence(first: 0, next: { $0 + 1 }).prefix(2)
print(Array(zip(xs, ys)))
// Prints "[(0, 0), (1, 1)]"
print(Array(zip(xs, default: nil, ys, default: 42)))
// Prints "[(0, 0), (1, 1), (2, 42)]"
print(Array(zip(xs, default: nil, ys, default: { Int.random(in: 0...42) })))
// Prints "[(0, 0), (1, 1), (2, 7)]"
zip(_:default:_:default:)
show / hide
Declaration
@inlinable public func zip <Collection1: Collection, Collection2: Collection> (
_ collection1: Collection1, default element1: @autoclosure @escaping () -> Collection1.Element? = nil,
_ collection2: Collection2, default element2: @autoclosure @escaping () -> Collection2.Element? = nil
) -> Zip2Collection<Collection1, Collection2>
Examples
let xs = [0, 1, 2]
let ys = [0, 1]
print(Array(zip(xs, ys)))
// Prints "[(0, 0), (1, 1)]"
print(Array(zip(xs, default: nil, ys, default: 42)))
// Prints "[(0, 0), (1, 1), (2, 42)]"
print(Array(zip(xs, default: nil, ys, default: { Int.random(in: 0...42) })))
// Prints "[(0, 0), (1, 1), (2, 7)]"
zipLongest(_:_:)
show / hide
Declaration
@inlinable public func zipLongest <Sequence1: Sequence, Sequence2: Sequence> (
_ sequence1: Sequence1,
_ sequence2: Sequence2
) -> ZipLongest2Sequence<Sequence1, Sequence2>
Examples
let xs = sequence(first: 0, next: { $0 + 1 }).prefix(3)
let ys = sequence(first: 0, next: { $0 + 1 }).prefix(2)
print(Array(zipLongest(xs, ys)))
// Prints "[(Optional(0), Optional(0)), (Optional(1), Optional(1)), (Optional(2), nil)]"
zipLongest(_:_:)
show / hide
Declaration
@inlinable public func zipLongest <Collection1: Collection, Collection2: Collection> (
_ collection1: Collection1,
_ collection2: Collection2
) -> ZipLongest2Collection<Collection1, Collection2>
Examples
let xs = [0, 1, 2]
let ys = [0, 1]
print(Array(zipLongest(xs, ys)))
// Prints "[(Optional(0), Optional(0)), (Optional(1), Optional(1)), (Optional(2), nil)]"
extension
s On Sequence
compactUnzip()
show / hide
Declaration
extension Sequence {
@inlinable public func compactUnzip <Element1, Element2> () -> ([Element1], [Element2]) where
Element == (Optional<Element1>, Optional<Element2>)
}
Example
let xs = sequence(first: 0, next: { $0 + 1 }).prefix(3)
let ys = sequence(first: 0, next: { $0 + 1 }).prefix(2)
print(zipLongest(xs, ys).compactUnzip())
// Prints "([0, 1, 2], [0, 1])"
unzip()
show / hide
Declaration
extension Sequence {
@inlinable public func unzip <Element1, Element2> () -> ([Element1], [Element2]) where
Element == (Element1, Element2)
}
Example
let xs = sequence(first: 0, next: { $0 + 1 }).prefix(3)
let ys = sequence(first: 0, next: { $0 + 1 }).prefix(2)
print(zip(xs, ys).unzip())
// Prints "([0, 1], [0, 1])"
extension
s On Collection
compactUnzip()
show / hide
Declaration
extension Collection {
@inlinable public func compactUnzip <Element1, Element2> () -> ([Element1], [Element2]) where
Element == (Optional<Element1>, Optional<Element2>)
}
Example
let xs = [0, 1, 2]
let ys = [0, 1]
print(zipLongest(xs, ys).compactUnzip())
// Prints "([0, 1, 2], [0, 1])"
unzip()
show / hide
Declaration
extension Collection {
@inlinable public func unzip <Element1, Element2> () -> ([Element1], [Element2]) where
Element == (Element1, Element2)
}
Example
let xs = [0, 1, 2]
let ys = [0, 1]
print(zip(xs, ys).unzip())
// Prints "([0, 1], [0, 1])"
extension
s On LazyCollectionProtocol
compactUnzip()
show / hide
Declaration
extension LazyCollectionProtocol {
@inlinable public func compactUnzip <Element1, Element2> () -> (
LazyMapCollection<LazyFilterCollection<LazyMapCollection<Elements, Element1?>>, Element1>,
LazyMapCollection<LazyFilterCollection<LazyMapCollection<Elements, Element2?>>, Element2>
) where Elements.Element == (Optional<Element1>, Optional<Element2>)
}
Example
let xs = [0, 1, 2].lazy
let ys = [0, 1].lazy
print(zipLongest(xs, ys).compactUnzip())
// Prints "([0, 1, 2], [0, 1])"
unzip()
show / hide
Declaration
extension LazyCollectionProtocol {
@inlinable public func unzip <Element1, Element2> () -> (
LazyMapCollection<Elements, Element1>,
LazyMapCollection<Elements, Element2>
) where Elements.Element == (Element1, Element2)
}
Example
let xs = [0, 1, 2].lazy
let ys = [0, 1].lazy
print(zip(xs, ys).unzip())
// Prints "([0, 1], [0, 1])"
Types
Zip2Sequence
show / hide
Declaration
@_fixed_layout public struct Zip2Sequence <Sequence1: Sequence, Sequence2: Sequence> {
@inlinable public init (
_ sequence1: Sequence1, default element1: @escaping () -> Sequence1.Element? = { nil },
_ sequence2: Sequence2, default element2: @escaping () -> Sequence2.Element? = { nil }
)
}
extension Zip2Sequence {
@_fixed_layout public struct Iterator
@inlinable public var underestimatedCount: Int { get }
}
extension Zip2Sequence.Iterator: IteratorProtocol {
public typealias Element = (Sequence1.Element, Sequence2.Element)
@inlinable public mutating func next () -> Element?
}
extension Zip2Sequence: Sequence {
public typealias Element = (Sequence1.Element, Sequence2.Element)
@inlinable public func makeIterator () -> Iterator
}
extension Zip2Sequence: LazySequenceProtocol where
Sequence1: LazySequenceProtocol, Sequence2: LazySequenceProtocol
ZipLongest2Sequence
show / hide
Declaration
@_fixed_layout public struct ZipLongest2Sequence <Sequence1: Sequence, Sequence2: Sequence> {
@inlinable public init (_ sequence1: Sequence1, _ sequence2: Sequence2)
}
extension ZipLongest2Sequence {
@_fixed_layout public struct Iterator {}
@inlinable public var underestimatedCount: Int { get }
}
extension ZipLongest2Sequence.Iterator: IteratorProtocol {
public typealias Element = (Sequence1.Element?, Sequence2.Element?)
@inlinable public mutating func next () -> Element?
}
extension ZipLongest2Sequence: Sequence {
public typealias Element = (Sequence1.Element?, Sequence2.Element?)
@inlinable public func makeIterator () -> Iterator
}
extension ZipLongest2Sequence: LazySequenceProtocol where
Sequence1: LazySequenceProtocol, Sequence2: LazySequenceProtocol
Zip2Index
show / hide
Declaration
@_fixed_layout public struct Zip2Index <Collection1: Collection, Collection2: Collection>
extension Zip2Index: Comparable {
@inlinable public static func < (lhs: Zip2Index, rhs: Zip2Index) -> Bool
}
Zip2Collection
show / hide
Declaration
@_fixed_layout public struct Zip2Collection <Collection1: Collection, Collection2: Collection> {
@inlinable public init (
_ collection1: Collection1, default element1: @escaping () -> Collection1.Element? = { nil },
_ collection2: Collection2, default element2: @escaping () -> Collection2.Element? = { nil }
)
}
extension Zip2Collection: Sequence {
public typealias Element = (Collection1.Element, Collection2.Element)
public typealias Iterator = Zip2Sequence<Collection1, Collection2>.Iterator
@inlinable public func makeIterator () -> Iterator
}
extension Zip2Collection: Collection {
public typealias Index = Zip2Index<Collection1, Collection2>
public typealias SubSequence = Slice<Zip2Collection>
@inlinable public var startIndex: Index { get }
@inlinable public var endIndex: Index { get }
@inlinable public subscript (position: Index) -> Element { get }
@inlinable public func index (after i: Index) -> Index
}
extension Zip2Collection: BidirectionalCollection where
Collection1: RandomAccessCollection, Collection2: RandomAccessCollection {
@inlinable public func index (before i: Index) -> Index
}
extension Zip2Collection: RandomAccessCollection where
Collection1: RandomAccessCollection, Collection2: RandomAccessCollection
extension Zip2Collection {
@inlinable public func distance (from start: Index, to end: Index) -> Int
@inlinable public func index (_ i: Index, offsetBy distance: Int) -> Index
@inlinable public func index (_ i: Index, offsetBy distance: Int, limitedBy limit: Index) -> Index?
}
extension Zip2Collection: LazySequenceProtocol, LazyCollectionProtocol where
Collection1: LazyCollectionProtocol, Collection2: LazyCollectionProtocol
ZipLongest2Collection
show / hide
Declaration
@_fixed_layout public struct ZipLongest2Collection <Collection1: Collection, Collection2: Collection> {
@inlinable public init (_ collection1: Collection1, _ collection2: Collection2)
}
extension ZipLongest2Collection: Sequence {
public typealias Element = (Collection1.Element?, Collection2.Element?)
public typealias Iterator = ZipLongest2Sequence<Collection1, Collection2>.Iterator
@inlinable public func makeIterator () -> Iterator
}
extension ZipLongest2Collection: Collection {
public typealias Index = Zip2Index<Collection1, Collection2>
@inlinable public var startIndex: Index { get }
@inlinable public var endIndex: Index { get }
@inlinable public subscript (index: Index) -> Element { get }
@inlinable public func index (after index: Index) -> Index
}
extension ZipLongest2Collection: BidirectionalCollection where
Collection1: RandomAccessCollection, Collection2: RandomAccessCollection {
@inlinable public func index (before index: Index) -> Index
}
extension ZipLongest2Collection: RandomAccessCollection where
Collection1: RandomAccessCollection, Collection2: RandomAccessCollection
extension ZipLongest2Collection {
@inlinable public func distance (from start: Index, to end: Index) -> Int
@inlinable public func index (_ i: Index, offsetBy distance: Int) -> Index
@inlinable public func index (_ i: Index, offsetBy distance: Int, limitedBy limit: Index) -> Index?
}
extension ZipLongest2Collection: LazySequenceProtocol, LazyCollectionProtocol where
Collection1: LazyCollectionProtocol, Collection2: LazyCollectionProtocol