Proposal: Pattern Matching Partial Function (#111)


(Thorsten Seitz) #1

You are right, it seems that Scala is doing automatic unsplatting now. Why it doesn’t work for reduce I can’t say.

-Thorsten

···

Am 05. Februar 2016 um 07:01 schrieb Craig Cruden ccruden@novafore.com:

No, that cannot be the reason.

— this works

val r2 = a.foldLeft(0) {

case (acc, x) if x > 4 => println(x); acc + x

case (acc, x) if x <= 4 => acc - x

}

— this does not.

val r3 = a.reduce {

case (acc, x) if x > 4 => println(x); acc + x

case (acc, x) if x <= 4 => acc - x

}

On 2016-02-05, at 12:59:06, Thorsten Seitz tseitz42@icloud.com wrote:

The reason is simply that partial functions in Scala are unary function, i.e. take only one argument, while reduce takes a binary function.

I already wrote in another mail why I think that the reduce example in the proposal is problematic, because it does argument splatting.

-Thorsten

Am 04.02.2016 um 22:57 schrieb Craig Cruden via swift-evolution swift-evolution@swift.org:

Off Topic a little.

I was playing around with the equivalent case - partial functions in Scala ….

I have known for whatever reason that they would not work within the Scala reduce but would work in other functions…. and it bugged me as to why it would work in most but not all (I was wondering if this was some weird implementation limitation).

As far as I can tell, there is a little difference that prevents it. Swift reduce is probably the same as Scala’s foldLeft. I had thought Scala’s reduce is a specialized form of fold but it is not the case.

The difference has to be that Scala’s reduce takes a “commutative monoid” — and case partial functions by their very nature [conditions on accumulator or value prevent it] are not commutative in nature — or at least not guaranteed to be so.

Scala’s fold is either left to right, or right to left. reduce on the other hand provides no guarantees because it may parallelize it [I am wondering if this is something in common with the concept of big data mapReduce].

——

Paul, I will look at updating the proposal to make that clear.

On 2016-02-05, at 4:17:03, Paul Ossenbruggen via swift-evolution swift-evolution@swift.org wrote:

I haven’t thought of doing it this way This is why these reviews are great. I agree with Craig that, we would not allow commas before the colon. The proposal probably should be updated to make sure that is clear.

Thanks

  • Paul

Sent from my iPhone

On Feb 4, 2016, at 10:31 AM, Maximilian Hünenberger m.huenenberger@me.com wrote:

I definitely see the point against dictionaries but I’m afraid of the actual pattern matching in “cases”.

match(1) {

case 1, 3: 10

case 2, 4: 20

default: 30

}

// with “cases”

match(1) { cases

1, 3: 10,

2, 4: 20,

default: 30

}

// it is very hard to disambiguate between pattern and return value even though for the compiler it could be doable

match(1) { cases

1, 3: 10, 2, 4: 20, default: 30

}

There should be a more visible distinction between two “case-item-map” like “|”. But we have to consider “expression-patterns” where operators (like “|”) are ambiguous.

  • Maximilian

Am 04.02.2016 um 18:48 schrieb Paul Ossenbruggen possen@gmail.com:

That is a good point and I forgot about that. There are frequent cases where what I am dealing with is not hashable and it is generally a lot of work to make it hashable, including adding a heading function which if done wrong, can lead to errors.

Sent from my iPhone

On Feb 4, 2016, at 8:38 AM, Charles Constant via swift-evolution swift-evolution@swift.org wrote:

I still vote for keeping “cases”

I see it as a replacement for dictionary literal “pattern matching”:

[1 : “one”, 2 : “two”, 3 : “three”][1] ?? “undefined”

A dictionary needs keys that are Hashable. A dictionary produces an optional. We’ve discussed this, and more, earlier in the thread.

Even though it would be nice to have but I don’t think that I would use it frequently.

Granted, it’s a bit ugly, but given the choice, I would pick “cases” over “case case case case …” every time.

In addition, to be more consistent with “case”, “cases” would introduce pattern matching which doesn’t seem right with this concise syntax.

I haven’t thought this through. It just bums me out a little to replace the switch with something that still has imho unnecessary verbosity.


swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Craig Cruden) #2

At a theoretical level — if you add ‘if/where’ conditions to either acc or x then the function being performed can no longer be guaranteed to be commutative (especially if the ‘if’ uses ‘acc’ in it).

If the function is not commutative, then the reduce cannot be broken down and processed in parallel.

‘map’ (single values) don’t have this issue since it is a single value independent of all other values (so it can be processed in parallel - no problem).

‘fold’ by it’s definition either processes things synchronously left to right, or right to left — and not in parallel.

Swift reduce has a starting value and likely is just a fold, not a reduce (at least when it is compared to Scala; not sure about when it comes to big data).

Just a guess really though, at least it puts my mind to rest.

···

On 2016-02-05, at 13:48:33, Thorsten Seitz <tseitz42@icloud.com> wrote:

You are right, it seems that Scala is doing automatic unsplatting now. Why it doesn't work for reduce I can't say.

-Thorsten

Am 05. Februar 2016 um 07:01 schrieb Craig Cruden <ccruden@novafore.com>:

No, that cannot be the reason.

— this works
val r2 = a.foldLeft(0) {
  case (acc, x) if x > 4 => println(x); acc + x
  case (acc, x) if x <= 4 => acc - x
}

— this does not.
val r3 = a.reduce {
  case (acc, x) if x > 4 => println(x); acc + x
  case (acc, x) if x <= 4 => acc - x
}

On 2016-02-05, at 12:59:06, Thorsten Seitz <tseitz42@icloud.com <mailto:tseitz42@icloud.com>> wrote:

The reason is simply that partial functions in Scala are unary function, i.e. take only one argument, while reduce takes a binary function.

I already wrote in another mail why I think that the reduce example in the proposal is problematic, because it does argument splatting.

-Thorsten

Am 04.02.2016 um 22:57 schrieb Craig Cruden via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>>:

Off Topic a little.

I was playing around with the equivalent case - partial functions in Scala ….

I have known for whatever reason that they would not work within the Scala `reduce` but would work in other functions…. and it bugged me as to why it would work in most but not all (I was wondering if this was some weird implementation limitation).

As far as I can tell, there is a little difference that prevents it. Swift reduce is probably the same as Scala’s foldLeft. I had thought Scala’s reduce is a specialized form of fold but it is not the case.

The difference has to be that Scala’s `reduce` takes a “commutative monoid” — and case partial functions by their very nature [conditions on accumulator or value prevent it] are not commutative in nature — or at least not guaranteed to be so.

Scala’s fold is either left to right, or right to left. `reduce` on the other hand provides no guarantees because it may parallelize it [I am wondering if this is something in common with the concept of big data mapReduce].

——

Paul, I will look at updating the proposal to make that clear.

On 2016-02-05, at 4:17:03, Paul Ossenbruggen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

I haven't thought of doing it this way This is why these reviews are great. I agree with Craig that, we would not allow commas before the colon. The proposal probably should be updated to make sure that is clear.

Thanks
- Paul

Sent from my iPhone

On Feb 4, 2016, at 10:31 AM, Maximilian Hünenberger <m.huenenberger@me.com <mailto:m.huenenberger@me.com>> wrote:

I definitely see the point against dictionaries but I’m afraid of the actual pattern matching in "cases".

match(1) {
    case 1, 3: 10
    case 2, 4: 20
    default: 30
}

// with "cases"

match(1) { cases
    1, 3: 10,
    2, 4: 20,
    default: 30
}

// it is very hard to disambiguate between pattern and return value even though for the compiler it could be doable
match(1) { cases
    1, 3: 10, 2, 4: 20, default: 30
}

There should be a more visible distinction between two "case-item-map" like "|". But we have to consider "expression-patterns" where operators (like "|") are ambiguous.

- Maximilian

Am 04.02.2016 um 18:48 schrieb Paul Ossenbruggen <possen@gmail.com <mailto:possen@gmail.com>>:

That is a good point and I forgot about that. There are frequent cases where what I am dealing with is not hashable and it is generally a lot of work to make it hashable, including adding a heading function which if done wrong, can lead to errors.

Sent from my iPhone

On Feb 4, 2016, at 8:38 AM, Charles Constant via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

I still vote for keeping "cases"

I see it as a replacement for dictionary literal "pattern matching":
[1 : "one", 2 : "two", 3 : "three"][1] ?? "undefined"

A dictionary needs keys that are Hashable. A dictionary produces an optional. We've discussed this, and more, earlier in the thread.

Even though it would be nice to have but I don’t think that I would use it frequently.

Granted, it's a bit ugly, but given the choice, I would pick "cases" over "case case case case ..." every time.

In addition, to be more consistent with "case", "cases" would introduce pattern matching which doesn’t seem right with this concise syntax.

I haven't thought this through. It just bums me out a little to replace the switch with something that still has imho unnecessary verbosity.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(Dave Abrahams) #3

IME fold and reduce have always been synonyms. Did I miss some memo?

···

on Fri Feb 05 2016, Craig Cruden <swift-evolution@swift.org> wrote:

Swift reduce has a starting value and likely is just a fold, not a
reduce (at least when it is compared to Scala; not sure about when it
comes to big data).

--
-Dave


(Craig Cruden) #4

Actually with Scala (and from what I gather from Big Data) — reduce and fold from the outside look like synonyms — BUT they are not.

inline from http://stackoverflow.com/questions/25158780/difference-between-reduce-and-foldleft-fold-in-functional-programming-particula

reduce vs foldLeft

A big big difference, not mentioned in any other stackoverflow answer relating to this topic clearly, is that reduce should be given a commutative monoid, i.e. an operation that is both commutative and associative. This means the operation can be parallelized.

This distinction is very important for Big Data / MPP / distributed computing, and the entire reason why reduce even exists. The collection can be chopped up and the reduce can operate on each chunk, then the reduce can operate on the results of each chunk - in fact the level of chunking need not stop one level deep. We could chop up each chunk too. This is why summing integers in a list is O(log N) if given an infinite number of CPUs.

If you just look at the signatures there is no reason for reduce to exist because you can achieve everything you can with reduce with a foldLeft. The functionality of foldLeft is a greater than the functionality of reduce.

But you cannot parallelize a foldLeft, so its runtime is always O(N) (even if you feed in a commutative monoid). This is because it's assumed the operation is not a commutative monoid and so the cumulated value will be computed by a series of sequential aggregations.

foldLeft does not assume commutativity nor associativity. It's associativity that gives the ability to chop up the collection, and it's commutativity that makes cumulating easy because order is not important (so it doesn't matter which order to aggregate each of the results from each of the chunks). Strictly speaking commutativity is not necessary for parallelization, for example distributed sorting algorithms, it just makes the logic easier because you don't need to give your chunks an ordering.

If you have a look at the Spark documentation for reduce it specifically says "... commutative and associative binary operator"

http://spark.apache.org/docs/1.0.0/api/scala/index.html#org.apache.spark.rdd.RDD
Here is proof that reduce is NOT just a special case of foldLeft

val intParList: ParSeq[Int] = (1 to 100000).map(_ => scala.util.Random.nextInt()).par

timeMany(1000, intParList.reduce(_ + _))

Took 462.395867 milli seconds

timeMany(1000, intParList.foldLeft(0)(_ + _))

Took 2589.363031 milli seconds
reduce vs fold

Now this is where it gets a little closer to the FP Category Theory roots, and a little trickier to explain.

There is no fold method in Scalding because under the (strict) Map Reduce programming model we cannot define fold because chunks do not have an ordering and fold only requires associativity, not commutativity. Spark does have fold because its framework is a super-set of the Map Reduce programming model and can order its chunks. Well you can actually do this in Hadoop too, but Scalding doesn't seem to expose this functionality in the version I'm familiar with.

Put simply, reduce works without an order of cumulation, fold requires an order of cumulation and it is that order of cumulation that necessitates a zero value NOT the existence of the zero value that distinguishes them. Strictly speaking reduce should work on an empty collection, because its zero value can by deduced by taking an arbitrary value x and then solving x op y = x, but that doesn't work with a non-commutative operation as there can exist a left and right zero value that are distinct (i.e. x op y != y op x). Of course Scala doesn't bother to work out what this zero value is as that would require doing some mathematics (which are probably uncomputable), so just throws an exception.

This difference between reduce* and fold* is a FP historical convention and has its roots in Category Theory. I hope now this deep difference relating to parallelization will no longer go unnoticed :slight_smile:

···

On 2016-02-06, at 2:30:01, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:

on Fri Feb 05 2016, Craig Cruden <swift-evolution@swift.org> wrote:

Swift reduce has a starting value and likely is just a fold, not a
reduce (at least when it is compared to Scala; not sure about when it
comes to big data).

IME fold and reduce have always been synonyms. Did I miss some memo?

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution