Best way to write "list comprehension"

yxckjhasdkjh · May 25, 2018, 9:25am

I have the following code in my project:

// typealias B = SomeType
// let partitions: [[B]] = ...
let pairsAcrossPartitions = Array(partitions.enumerated()).flatMap { (p1: (Int, [B])) -> [(Int, Int, B, B)] in
    let (i, partition1) = p1
    return Array(partitions.enumerated()).flatMap { (p2: (Int, [B])) -> [(Int, Int, B, B)] in
        let (j, partition2) = p2
        return partition1.flatMap { (el1: B) -> [(Int, Int, B, B)] in
            return partition2.map { (el2: B) -> (Int, Int, B, B) in
                return (i, j, el1, el2)
            }
         }
     }
}

This returns a cartesian product of sorts: It returns all pairs (i, j, el1, el2) where e1 is from partition i and e2 is from partition j.

The nested flatMap -> flatMap -> ... -> map is kind of ugly, and I wonder if there's a better way to do this. In other languages, there are constructs for this, e.g. in Scala there are for-expressions (which do actually de-sugar into the same kind of code), Haskell has list comprehensions and do-notation, etc.

I've seen that there is some way to emulate list comprehensions with the Array initializer, but I don't think this would work in this case because the later steps in the comprehension depend on the result of the previous ones.

Torust · May 25, 2018, 11:53am

This might be a little unhelpful, but why are you trying to do it that way? I ran the code and tried to understand what it was doing, but found that I needed to convert it into an imperative style to comprehend it. Something like:

var pairsAcrossPartitions = [(Int, Int, B, B)]()
for (i, partition1) in partitions.enumerated() {
    for (j, partition2) in partitions.enumerated() {
        for el1 in partition1 {
            for el2 in partition2 {
                pairsAcrossPartitions.append( (i, j, el1, el2) )
            }
        }
    }
}

is much easier to understand to me, and will also be faster if the compiler can't figure out what you're trying to do (e.g. in a debug build, where all those intermediate arrays won't be optimised out).

A couple other small things about your code snippet: firstly, a note that the Array() constructors around the return values of flatMap are unnecessary. Secondly, you don't have to create the p1 and p2 temporaries; you can simply do this:

let pairsAcrossPartitions = partitions.enumerated().flatMap { (i, partition1) -> [(Int, Int, B, B)] in
    return partitions.enumerated().flatMap { (j, partition2) -> [(Int, Int, B, B)] in
        return partition1.flatMap { (el1: B) -> [(Int, Int, B, B)] in
            return partition2.map { (el2: B) -> (Int, Int, B, B) in
                return (i, j, el1, el2)
            }
        }
    }
}

Now to your actual question. If you're tied to a functional style, that inner loop to me looks like the result of zipping partition1 and partition2; you could go something like:

zip(partition1.lazy.map { repeatElement($0, count: partition2.count) }, repeatCollection(partition2) }.map { (i, j, $0, $1) }

using a hypothetical repeatCollection primitive (an example can be seen here) that infinitely loops through the source sequence. You could apply a similar approach to the outer sequence. Again, though, I personally think an imperative style is the way to go with this particular problem.

yxckjhasdkjh · May 25, 2018, 12:42pm

Functional vs. imperative is a philosophical difference and what you find more readable often depends on what you're used to. I tend to find imperative code harder to parse (and to review), because I always have to look at every step of the "how", and can focus less on expressions and transformations - in particular, imperative code never really tells me whether there are any side effects happening inside the loop; if I look at the functional version, I can somewhat assume that "ah, this is just a transformation, I don't actually have to look inside of it because it won't really modify anything else" (in theory you could do side effects in a map, but I would consider that to be bad style).

In particular, the flatMap -> flatMap -> ... -> map pattern is a very common one; I agree that it's weird when you don't know it, but when you do, you know what this is about. (That's why Haskell and Scala have syntactic sugar exactly for this use case which is very readable). It also generalises seamlessly to areas where imperative versions cannot be used, e.g. with SwiftCheck generators.

But, yes, maybe this slightly goes against common Swift usage, so I'll keep my mind open about falling back to a more imperative style in some cases.

The Array() constructors are around partitions.enumerated(), since that returns an EnumeratedSequence which apparently doesn't have flatMap.

Right, good point!

I'll think about your zipping approach, so thanks for that pointer!

Torust · May 25, 2018, 12:51pm

The Array() constructors are around partitions.enumerated(), since that returns an EnumeratedSequence which apparently doesn't have flatMap.

That’s odd. It might be a toolchain difference preventing it from working for you, but flatMap is defined on Sequence and EnumeratedSequence conforms to Sequence. I can say for sure that it works without the Array wrapper when running it locally in a playground.

yxckjhasdkjh · May 25, 2018, 12:56pm

Hm, it seems you're correct, but something weird is going on. I was trying to do this in XCode without flatMap and XCode was showing me errors. The other thing is that when I go into the Swift REPL and enter

[1,2,3].enumerated().flatMap

I get an error, but

[1,2,3].flatMap
// or: Array([1,2,3].enumerated()).flatMap

return something.

saagarjha · May 25, 2018, 1:15pm

I think that's because Swift can't figure out the type of this expression. If you add a type hint it might work out.

yxckjhasdkjh · May 25, 2018, 1:19pm

True, this works:

let fm: (((Int, Int)) throws -> [Int]) throws -> [Int] = [1,2,3].enumerated().flatMap

Thanks :)

anthonylatsis · May 25, 2018, 2:01pm

This works because the compiler resolves this as a reference to the deprecated flatMap((Self.Element) -> String?), which doesn't depend on any generic parameter other than Element (which can be inferred). It is defined on Collection.

enumerated(), however, returns a Sequence, where flapMap methods have an unbound generic parameter which can't be inferred in your expression. Additionally, the reference to flatMap here is ambiguous while the old flatMap (which is now warned to be used as compactMap instead) still exists.
This, for instance, will work: [1,2,3].enumerated().flatMap { _ in []}

The real problem here is the diagnostic engine that apparently fails to produce a decent error message. It might be picking the obsoleted flatMap, the @available attribute of which can be influencing this diagnostic instead of producing an ambiguity error. SR-7776

Torust · May 25, 2018, 11:11pm

I thought of another functional approach to this if the exact ordering doesn't matter:

let numberedElements = partitions.enumerated().flatMap { (partitionIndex, partition) in
    partition.map { (partitionIndex, $0) }
}
let pairsAcrossPartitions = numberedElements.flatMap { element in
    numberedElements.map { (element.0, $0.0, element.1, $0.1) }
}

It still doesn't solve the list comprehension issue, but it's certainly a nicer way of writing it in a functional style in Swift. I realise it's just a small variation to your original approach, but I personally think separating out the elements and flow of data this way makes the intent clearer.

yxckjhasdkjh · May 28, 2018, 9:41am

That looks more readable, thanks.