Add a padded version of zip to complement the current zip

saagarjha · April 8, 2018, 1:42am

(Spun off from SE-0203 — Rename Sequence.elementsEqual - #69 by Ben_Cohen)

The way that zip(_:_:) works currently is to take pairs of elements from its two Sequence parameters (until one of these runs out) and provide a new Sequence that provides these pairs as a tuple. Unfortunately, sometimes this isn't quite what we need: it's be great to have an operation that went through all of the elements in both Sequences, padding the shorter Sequence so that it matches the length of the longer one. I'd like to kick around an idea to this end.

The first thing that we need to consider is what "padding" actually means. An option pitched previously on the list, as well as a comparable function in Python's itertools, zip_longest, suggests using nil as a default for padding. This does have the side effect of wrapping every element in an Optional:

let numbers = [1, 2, 3]
let letters = ["a", "b"]
Array(zipExtendingShorterWithNil(numbers, letters))
// [(Optional(1), Optional("a")),
//  (Optional(2), Optional("b")),
//  (Optional(3), nil)]

Another option would be to allow for a user-definded padding value:

Array(zip(numbers, letters, paddingIfNecessaryWith: (0, "")))
// [(1, "a"), (2, "b"), (3, "")]

These can both coexist; we might even consider making their signatures match so it looks like the one that pads with nil is a "default" of the other one (though, this is impossible in to completely emulate because their return type would differ).

Finally, we'd could also have another function that allowed for generating elements dynamically:

Array(zip(numbers, letters, paddingIfNecessaryWith: ({
	return Int(arc4random())
}, {
	return String(repeating: "", count: Int(arc4random())) // Don't run this, by the way
})))
// [(1, "a"), (2, "b"), (3, "")]

Regarding naming, @Ben_Cohen has mentioned that he would keeping a zip prefix:

This has the benefit of enhancing discovery and underlining the relationship with the current zip(_:_:). I'm not great at naming things, but here's a couple other alternative (and pretty ugly, IMO) names, just for reference, including some that don't start with zip:

Function that pads with nil

nilPaddedZip(_:_:)
zipExtendingWithNil(_:_:)
zipPaddingWithNil(_:_:)
zipExtendingShorterWithNil(_:_:)
zipPaddingShorterWithNil(_:_:)

Functions that allow for user-defined padding:

zip(_:_:withPadding:)
zip(_:_:extendingWith:)
zip(_:_:paddingShorterWith:)
zip(_:_:extendingShorterWith:)
zip(_:_:paddingShorterIfNecessaryWith:)
zip(_:_:extendingShorterIfNecessaryWith:)

Bikeshedding is possible here on whether we should have one 2-element tuple parameter for the user's choice in padding (as shown here), or whether we should have two separate parameters.

And one that's similar for both:

zipLonger(_:_:); zipLonger(_:_:paddingWith:)

Please suggest your own if you have one that's better!

Architecturally, we could have this zip also return a Zip2Sequence as zip(_:_:) does. I'll have to think more about how to implement this, but I was thinking about making some sort of PaddedSequence that wraps around a given Sequence and infinitely pads it with a given padding value once the wrapped Sequence runs out. We might even want to consider exposing this publicly, since it be useful in and of itself.

davedelong · April 8, 2018, 1:57am

@Erica_Sadun and I have tossed this idea around as well; we think there's a need for it.

However, I'll point out that you need two default parameters, because zip() has two generic parameters. There's no way to provide a single default/padding value that will work with both. For example:

let numbers = [1, 2, 3]
let letters = ["a", "b", "c", "d"]
let zipped = zip(numbers, letters) // [(1, "a"), (2, "b"), (3, "c")]

We could provide a default value for the "missing" numbers array, but it would have to be an Int. However, if letters was the shorter of the two, then the default value would have to be a String.

So, you really need:

func zip<T, U>(_ left: T, _ right: U, paddingLeft: T.Element, paddingRight: U.Element) // where T and U are sequences or whatever

You could even modify the existing function in a source-compatible way, like so:

func zip<T, U>(_ left: T, _ right: U, paddingLeft: T.Element? = nil, paddingRight: U.Element? = nil) // pads if the corresponding value is non-nil

Also, you'd probably want to consider the a second version where the padding values are generated and not statically provided:

func zip<T, U>(_ left: T, _ right: U, paddingLeft: () → T.Element?, paddingRight: () → U.Element?)

That way you could do something like "pad this array of UIViews with a whole bunch more UIViews as necessary".

saagarjha · April 8, 2018, 2:01am

There are two parameters, they're just packed in a tuple:

I considered adding two parameters, but I thought it would be overly verbose because then we'd need argument labels like "paddingRight" and "paddingLeft". Another reason why I chose a tuple is that it makes extending this to many sequences a lot nicer, if we ever get variadic generics.

That's a good point; I should probably add it as well.

davedelong · April 8, 2018, 2:27am

Ah, of course! I didn't pick that up in your original description. Thanks for clarifying.

soroush · April 8, 2018, 3:57am

The last time I wrote this method, I used the word default and parameter counts to figure which implementation to use, which I think looks the cleanest from an interface perspective. I also added a version for when both sequences use the same type:

let zipped = zip([1,2,3], default: 0, ["a"], default: "-")

let zippedSameType = zip([1,2,3], [1], default: 0)

If i were to update this, I'd probably make a version that accepted closures as well.

Playgroundable gist with implementations.

jawbroken · April 8, 2018, 12:13pm

All the complexity and API surface created by supporting the various versions that take user-defined padding values and closures suggest to me that they don't really carry their weight. These forms seem to be easily emulated by just using standard Optional sugar with the .none padding version. e.g. using ?? to provide default values, force unwrapping when you know one Sequence is longer than the other (which avoids providing a “fake” default value for the longer Sequence and hiding logical errors), etc. As such, I think I would prefer only adding the .none padding version, and I like zipLonger(_:_:) and zipPaddingWithNil(_:_:) best of the naming options presented.

gwendal.roue · April 8, 2018, 12:35pm

Just like @jawbroken, I'd be keen at searching for nil padding first. This sounds natural, since the other versions (with default values or closures) can be derived from it.

// Sequence of (Int?, String?)
let pairs = zipLongest([1], ["a", "b"])

But the devil is in the detail. The existing zip function returns a Zip2Sequence which is lazy: it does not use much memory and does not consume the base sequences.

This mean that the hypothetical future zipLongest with nil padding would need do to the same (returning, say, Zip2LongestSequence):

func zipLongest<Sequence1, Sequence2>(_ sequence1: Sequence1, _ sequence2: Sequence2) -> Zip2LongestSequence<Sequence1, Sequence2> where Sequence1 : Sequence, Sequence2 : Sequence

In order to add default values, while preserving the lazy behavior, one should then use lazy:

// Sequence of (Int, String)
let pairsWithDefaultValues = pairs.lazy.map { ($0 ?? 0, $1 ?? "") }

Would it be an acceptable trade-off?

Edit: We could make Zip2LongestSequence lazy right from the start, so that one could use map without losing the lazy behavior that we expect from the standard Zip2Sequence.

But Zip2Sequence is not really a lazy sequence, as I wrote above: its map method returns an Array. It's only "pseudo-lazy". Making Zip2LongestSequence lazy while keeping a "pseudo-lazy" Zip2Sequence would be weird.

jawbroken · April 8, 2018, 12:53pm

Well, I can't say that I understand exactly what the guiding principles are behind lazy operations in Swift, but that is just standard behaviour for map in similar circumstances, so anyone interested in Swift performance already needs to know to sprinkle .lazy around if required. As you say, technically you could make map on Zip2LongestSequence lazy, but that would be inconsistent and violate various reasonable user assumptions. In other words, I don't see a problem with the behaviour you describe, or at least no new problem that isn't present in greedy map throughout Swift.

gwendal.roue · April 8, 2018, 1:00pm

This is true. Yet we have people in this thread who currently use padded zip methods that use default values. This means that there is an actual use case here. If the stdlib would only provide a nil-padded version, they could miss their convenience methods. I'd like this thread to eventually turn into a proposal, and that's why I expose as many possible reception issues as soon as I can.

davbeck · April 8, 2018, 3:11pm

Seems like this would be better handled by a dedicated collection method. Something like:

zip(a.rightPad(b.count, “”), b.rightPad(a.count, 0))

That would give you quit a bit of control while keeping the api surface area fairly small. For instance, you might want to rightPad in some cases, or only pad one argument.

Such a method could be used in a lot of other places as well.

saagarjha · April 8, 2018, 7:02pm

Unfortunately this would only work on Collection, since Sequence doesn't have a count property. I'm not saying what you've suggested is not potentially useful (and it might make a good pitch itself), but it doesn't completely solve the problem here because zip operates without knowledge of the size of the Sequences it's given.

xwu · April 8, 2018, 11:55pm

This is an interesting idea. I would consider how the proposal interacts with other discussions on the list. For example, how would the feature proposed here interact with those proposed by @bitjammer here?

saagarjha · April 9, 2018, 12:13am

I think that proposal might make it easier to implement such a padded zip: I could create a Sequence that joined the wrapped Sequence with one that infinitely produced the padding value.

dennisvennink · June 15, 2018, 2:42am

If we could rethink zip(_:_:) and Zip2Sequence from scratch, we could unify everything into a single free function and type. Its declaration:

func zip <Sequence1: Sequence, Sequence2: Sequence> (_ sequence1: Sequence1, _ sequence2: Sequence2, _ exhaustion: Exhaustion = .shortest) -> Zip2Sequence<Sequence1, Sequence2>

Exhaustion is an enum containing two cases, .shortest and .longest. Zip2Sequence.Element is of type (Sequence1.Element?, Sequence2.Element?).

As previously mentioned by other, all possible use cases can be derived from this:

Shortest, wrapped

print(Array(zip([0, 1, 2], [0, 1])))
// [(Optional(0), Optional(0)), (Optional(1), Optional(1))]

Shortest, unwrapped

print(Array(zip([0, 1, 2], [0, 1]).map({($0.0!, $0.1!)})))
// [(0, 0), (1, 1)]

Here, force unwrapping is possible since all of the tuple's values are guaranteed to have a value.

Longest, wrapped

print(Array(zip([0, 1, 2], [0, 1], .longest)))
// [(Optional(0), Optional(0)), (Optional(1), Optional(1)), (Optional(2), nil)]

Longest, with default values, wrapped

print(Array(zip([0, 1, 2], [0, 1], .longest).map({($0.0 ?? Optional(2), $0.1 ?? Optional(2))})))
// [(Optional(0), Optional(0)), (Optional(1), Optional(1)), (Optional(2), Optional(2))]

Longest, with default values, unwrapped

print(Array(zip([0, 1, 2], [0, 1], .longest).map({($0.0 ?? 2, $0.1 ?? 2)})))
// [(0, 0), (1, 1), (2, 2)]

jawbroken · June 15, 2018, 3:20am

That unification seems undesirable because nobody wants 1. (or 4. similarly) and 2. is very verbose comparatively.

I would still be interested in a proposal that introduces a padded zip, particularly the version that pads with nil.

dennisvennink · June 15, 2018, 5:22am

I completely agree. Just trying to think outside of the box.

dennisvennink · June 15, 2018, 5:23am

What about a Zip type that constructs zips?

public struct Zip <Sequence1: Sequence, Sequence2: Sequence> {
  private let sequence1: Sequence1
  private let sequence2: Sequence2

  public init (_ sequence1: Sequence1, _ sequence2: Sequence2) {
    (self.sequence1, self.sequence2) = (sequence1, sequence2)
  }
}

Namely, Zip.ShortestSequence:

extension Zip {
  public struct ShortestSequence: Sequence {
    internal let sequence1: Sequence1
    internal let sequence2: Sequence2

    internal init (_ sequence1: Sequence1, _ sequence2: Sequence2) {
      (self.sequence1, self.sequence2) = (sequence1, sequence2)
    }

    public struct Iterator: IteratorProtocol {
      public typealias Element = (Sequence1.Iterator.Element, Sequence2.Iterator.Element)

      internal var iterator1: Sequence1.Iterator
      internal var iterator2: Sequence2.Iterator
      internal var reachedEnd = false

      internal init (_ iterator1: Sequence1.Iterator, _ iterator2: Sequence2.Iterator) {
        (self.iterator1, self.iterator2) = (iterator1, iterator2)
      }

      public mutating func next() -> Element? {
        if self.reachedEnd {
          return nil
        }

        guard let element1 = iterator1.next(), let element2 = iterator2.next() else {
          self.reachedEnd.toggle()

          return nil
        }

        return (element1, element2)
      }
    }

    public func makeIterator () -> Iterator {
      return Iterator(self.sequence1.makeIterator(), self.sequence2.makeIterator())
    }
  }
}

Using the shortest property:

extension Zip {
  public var shortest: ShortestSequence {
    return ShortestSequence(sequence1, sequence2)
  }
}

And Zip.LongestSequence:

extension Zip {
  public struct LongestSequence: Sequence {
    internal let sequence1: Sequence1
    internal let sequence2: Sequence2

    internal init (_ sequence1: Sequence1, _ sequence2: Sequence2) {
      (self.sequence1, self.sequence2) = (sequence1, sequence2)
    }

    public struct Iterator: IteratorProtocol {
      public typealias Element = (Sequence1.Iterator.Element?, Sequence2.Iterator.Element?)

      internal var iterator1: Sequence1.Iterator
      internal var iterator2: Sequence2.Iterator
      internal var reachedEnd = false

      internal init (_ iterator1: Sequence1.Iterator, _ iterator2: Sequence2.Iterator) {
        (self.iterator1, self.iterator2) = (iterator1, iterator2)
      }

      public mutating func next() -> Element? {
        if self.reachedEnd {
          return nil
        }

        let element1 = iterator1.next()
        let element2 = iterator2.next()

        if element1 == nil && element2 == nil {
          reachedEnd.toggle()

          return nil
        }

        return (element1, element2)
      }
    }

    public func makeIterator () -> Iterator {
      return Iterator(self.sequence1.makeIterator(), self.sequence2.makeIterator())
    }
  }
}

Using the longest property:

extension Zip {
  public var longest: LongestSequence {
    return LongestSequence(sequence1, sequence2)
  }
}

Putting it together:

let zip = Zip([0, 1, 2], [0, 1])

print(Array(zip.shortest))
// [(0, 0), (1, 1)]

print(Array(zip.longest))
// [(Optional(0), Optional(0)), (Optional(1), Optional(1)), (Optional(2), nil)]

Both Sequences are implemented lazily. So this will all work efficiently:

print(Array(zip.shortest.lazy.prefix(1)))
// [(0, 0)]

print(Array(zip.longest.lazy.prefix(1)))
// [(Optional(0), Optional(0))]

(Though, having said that, SR-5754 is still not fixed.)

Out of everything I feel this might be the most elegant solution so far.

Thoughts?

jawbroken · June 15, 2018, 5:38am

I personally think it's a little strange to have a type that just constructs either of two other types, rather than just constructing either one directly or having the two different top level functions. If you found an independently useful context where you might want to use a SequencePair type then maybe it would make sense to be able to ask it to zip itself into a single sequence.

You might want to bump that bug report now that conditional conformances have been implemented, since the comments suggest it was waiting on that.

jeremyp · June 15, 2018, 10:46am

Interesting idea but I would modify it as follows:

Forget about the optional wrapping. If zip([1, 2, 3], ["one", "two"]) produces anything different to what it produces today, you will encounter massive resistance from the community and is a huge source breaking change. So no version of zip should wrap the elements of the sequences to be zipped.
modify the enum so that .longest takes associated padding values. The signature of your function remains the same.
```
enum ZipExhaustion<S1: Sequence, S2: Sequence>
{
    case shortest
    case longest(S1.Element, S2.Element)
}
```

So

zip([0, 1, 2], ["one", "two"])                   // -> [(0, "one"), (1, "two")]
zip([0, 1, 2], ["one", "two"], .shortest)        // -> [(0, "one"), (1, "two")]
zip([0, 1, 2], ["one", "two"], .longest(-1, "")) // -> [(0, "one"), (1, "two"), (2,"")]

If you want optionals and nil padding, you have to wrap the elements of the sequences in advance

let p1 = [0, 1, 2].map{ Optional<Int>.some($0) }
let p2 = ["one", "two"].map{ Optional<String>.some($0) }
zip(p1, p2, .longest(nil, nil))

bzamayo · June 15, 2018, 11:28am

IMO, anything that includes a way to specify replacement values for the different sequence types is overkill, and an enum is an even fancier strand of that diversion. Fancy, but unwieldy and not justifying the API surface. A simple zipLongest(a, b) covers 99% of the problem space, elegantly and simply.