Or try something like the design I suggested that tracks the starting index of the longer collection's unmatched suffix.
The current index of the shortest Collection
starts decrementing once the distance from the startIndex
to the current index of the longest Collection
is less than or equal to the distance from the startIndex
to the current index of the longest Collection
.
This should get handled by the final case
in the switch
posted above.
The other issue here is one of performance — those calls to distance(from:to:)
are O(n) on bidirectional collections, which turn index(before:)
into an O(n) operation, when it really should be constant time. That means that a variety of collection operations could end up being quadratic instead of linear.
Have you considered skipping bidirectional conformance, and just making the zip collections random access when the underlying collections are random access?
Exactly! distance
is linear for BidirectionalCollection
: https://github.com/apple/swift/blob/master/stdlib/public/core/BidirectionalCollection.swift#L293-L318
Yes, I know. The conditional conformance syntax might be deceiving here.
This is how it's currently implemented:
extension ZipLongest2Collection: BidirectionalCollection, RandomAccessCollection
where Collection1: RandomAccessCollection, Collection2: RandomAccessCollection {
<#...#>
}
All intermediate protocol
s must be specified, separated by commas, when using conditional conformance(s). Otherwise the compiler will complain.
Here's another example from the Standard Library:
extension ClosedRange: Collection, BidirectionalCollection, RandomAccessCollection
where Bound: Strideable, Bound.Stride: SignedInteger {
<#...#>
}
My apologies, I did miss that! Sounds like the right approach in that case.
Thoughts and further points for discussion:
-
Is this the right model for
Index
? Does it scale to higher arities? -
Are the conformances to
LazySequenceProtocol
andLazyCollectionProtocol
correctly implemented? -
What are the possibilities regarding conformance to
RangeReplaceableCollection
andMutableCollection
? -
Are there any additional performance gains to be made? Should we override certain default implementations?
-
On the topic of naming the global functions.
The consensus in the pitch phase was that
longest
should be part of the name. It is highly descriptive and has good precedence in Python.But which ones of these two are more Swifty?
-
zipLongest(_:_:)
:zipLongest([0, 1, 2], [0])
-
zip(longest:_:)
:zip(longest: [0, 1, 2], [0])
Taking existing symbols from the Standard Library into account, e.g.,
prefix(_:)
,prefix(upto:)
,prefix(through:)
andprefix(while:)
, the second one seems to fall more into line with current naming conventions. -
-
Now for the fun part. Which
ZipLongest2Collection
-specific members should we add? Given:let xs = zipLongest([0, 1, 2], [0])
-
A view on the shortest
prefix
:print(Array(xs.prefix)) // Prints "[(.some(0), .some(0))]" print(xs.prefix.count) // Prints "1"
(Up for bikeshedding.)
-
A view on the longest
suffix
:print(Array(xs.suffix)) // Prints "[(.some(1), .none), (.some(2), .none)]" print(xs.suffix.count) // Prints "2"
(Up for bikeshedding.)
-
Convenience
subscript
(s) using one (or two)Int
offset(s):print(xs[0])) // Prints "(.some(0), .some(0))" print(xs[1, 0])) // Prints "(.some(1), .some(0))"
In my view, we should provide at least the first version since
Index.init
is internal.It makes sense to put these on
Collection
. (Since the existingsubscript(_:)
is O(n) forCollection
s too.) -
(This one might be controversial. A binary version of
map(_:)
namedmap(_:_:)
. It would look something like this:let f: (Int?) -> Int = { $0 == .none ? 0 : $0! + 1 } let g: (Int?) -> Int = { $0 == .none ? 0 : $0! + 2 } print(xs.map(f, g))) // Prints "[(1, 2), (2, 0), (3, 0)]"
This method will increase expressivity and composibility.)
-
-
What can we implement similarly in
Zip2Collection
? How do we make sure the API between both variants is uniform?
Things to do on my end:
- Finalise the implementation (feedback from the community on the points above would be very helpful).
- Increase test coverage.
- Create benchmarks.
- Write missing documentation.
- When everything is "done", fold it back into apple/swift#18769.
- Finalise proposal.
It might be a good practice in general to reverse the order of the conformances to make the intent more clear, e.g., turn:
extension ClosedRange: Collection, BidirectionalCollection, RandomAccessCollection
where Bound: Strideable, Bound.Stride: SignedInteger {
<#...#>
}
into:
extension ClosedRange: RandomAccessCollection, BidirectionalCollection, Collection
where Bound: Strideable, Bound.Stride: SignedInteger {
<#...#>
}
This version implies to me that the first sequence/collection is or must be the longest of two.
Is this deliberately mismatching the indices?
Good point. Thought about that as well. In this style, something like zipLongest(sequences:_:)
/ zipLongest(collections:_:)
would then probably be better. But then we're better off with zipLongest(_:_:)
.
The Index
would be invalid in the context of Comparable
, yes. But since it returns a value we should be safe.
I think (1) is better. Like @jawbroken, I agree that (2) looks like longest-ness is something exclusive to the first argument, although it isn't.
Would names like "zipFully" or "zipToLongest" be better? Or would those only work if we rename the current "zip" to something like "zipToShortest"?
zip(_:_:)
can't be renamed. In most other languages the semantics are identical, breaking expectations. An even bigger issue is that it will break existing code. Additive features should be okay.
For instance, I haven't put much thought into zip(_:_:)
yet, but one idea I have is the addition of a default
or defaults
parameter:
print(Array(zip([0, 1, 2], [0], default: nil, 0)))
/// Prints "[(0, 0), (1, 0), (2, 0)]"
This should only be a few lines of code extra for both the Sequence
and Collection
versions of zip(_:_:)
and doesn't break existing code.
Back to naming, zipToLongest(_:_:)
could also work, but then zipLongest(_:_:)
is more succinct. zip(toLongestOf:or:)
/ zip(longestOf:or:)
is also a possibility that gets around the ambiguities of zip(longest:_:)
.
I'm looking into conforming Zip2Collection
to both ExpressibleByArrayLiteral
and ExpressibleByDictionaryLiteral
:
extension Zip2Collection: ExpressibleByArrayLiteral where
Collection1: RangeReplaceableCollection,
Collection2: RangeReplaceableCollection {
public typealias ArrayLiteralElement = Element
public init (arrayLiteral: Element...) {
self.init()
let arrayLiteralCount = arrayLiteral.count
_collection1.reserveCapacity(arrayLiteralCount)
_collection2.reserveCapacity(arrayLiteralCount)
for (element1, element2) in arrayLiteral {
_collection1.append(element1)
_collection2.append(element2)
}
}
}
extension Zip2Collection: ExpressibleByDictionaryLiteral where
Collection1: RangeReplaceableCollection,
Collection2: RangeReplaceableCollection {
public typealias Key = Collection1.Element
public typealias Value = Collection2.Element
public init (dictionaryLiteral elements: Element...) {
self.init()
let elementsCount = elements.count
_collection1.reserveCapacity(elementsCount)
_collection2.reserveCapacity(elementsCount)
for (element1, element2) in elements {
_collection1.append(element1)
_collection2.append(element2)
}
}
}
If anybody sees any room for improvement here, let me know.
We don't typically have those conformances for this kind of helper type — they're only produced as the result of some other operation. For example, Zip2Sequence
doesn't have any public initializers at all. You can only create one by using the zip(_:_:)
function.
Is there an official consensus from the core team on this subject?
Zip2Sequence
's init
ialiser is public
with underscored parameter labels:
print(Zip2Sequence<[Int], [Int]>(_sequence1: [0, 1, 2], _sequence2: [0, 1, 2]))
// Prints "Zip2Sequence<Array<Int>, Array<Int>>(_sequence1: [0, 1, 2], _sequence2: [0, 1, 2])"
But I understand what you're saying.
What's the consensus on adding support for higher arities?
Each additional arity will add four more global functions and five more types.
This is pretty much a hack that predates @usableFromInline
. It might go away.
What's more, I don't really see the point of adding them. The point of zipping is to construct the tuple sequence dynamically from two independent sequences. What would be the benefit of creating a zipped collection from a literal at compile time, instead of just creating an array or key/value pair literal?
Eh, it's marked // @testable
too. But given how trivial the zip
function is, I don't see that it really needs to be.
edit: yeah, that @testable
is bogus. I put up a PR to internalize the init
.
My main line of reasoning here was to provide an alternative shorthand for creating a default value for when Zip2Collection
is Optional
:
let xs: Zip2Collection<[Int], [Int]> = nil ?? [(42, 42)]
Or empty:
let xs = { $0.isEmpty ? [(42, 42)] : $0 }(zip([Int](), [Int]()))