Inefficiency in recursive value types


(Dimitri Racordon) #1

Hello fellow evolutionists,

I’m not sure this list is the best place to talk about this, so please redirect me if it should be discussed elsewhere.

While trying to implement algebraic data types in Swift, I stumble upon an interesting problem. Let’s consider the following example:

protocol Term {}

struct Zero: Term {}
struct Succ: Term {
    let pred: Term
}

var term: Term = Zero()
for _ in 0 ..< 20000 {
    term = Succ(pred: term)
}

This code will take forever to execute (about 1 minute on my mbp late 2013), whereas if I replace the structures with reference types it will finish almost instantly, which is what one would expect from such a simple data structure. Most likely, the problem lays in the fact that a deep copy is produced every time a new level of recursion is created. However, I guess this case could be detected by the compiler, as the `term` variable passed as parameter is immediately affected to the rvalue of the expression using it.

An even more powerful approach would be to keep track of the reference on the original `term` value under the hood, and update it if necessary in a copy-on-write fashion. As I understand it, this would enable this code to be efficiently executed as well:

protocol Term {}

struct Leaf: Term {}
struct Tree: Term {
    let left: Term
    let right: Term
}

var tree: Term = Leaf()
for _ in 0 ..< 20000 {
    tree = Tree(left: tree, right: tree)
}

Interestingly, while enumerations are value types, indirect enumerations don’t suffer the performance issue of the above examples. Note however that the issue will reappear if the enum isn’t marked `indirect`.

protocol Term {}

indirect enum Nat: Term {
    case zero
    case succ(Term)
}

var term = Nat.zero
for _ in 0 ..< 20000 {
    term = .succ(term)
}

Thanks,
Dimitri Racordon


(John McCall) #2

Hello fellow evolutionists,

I’m not sure this list is the best place to talk about this, so please redirect me if it should be discussed elsewhere.

More of a swift-dev topic. CC'ing there, BCC'ing evolution.

While trying to implement algebraic data types in Swift, I stumble upon an interesting problem. Let’s consider the following example:

protocol Term {}

struct Zero: Term {}
struct Succ: Term {
    let pred: Term
}

var term: Term = Zero()
for _ in 0 ..< 20000 {
    term = Succ(pred: term)
}

This code will take forever to execute (about 1 minute on my mbp late 2013), whereas if I replace the structures with reference types it will finish almost instantly, which is what one would expect from such a simple data structure. Most likely, the problem lays in the fact that a deep copy is produced every time a new level of recursion is created. However, I guess this case could be detected by the compiler, as the `term` variable passed as parameter is immediately affected to the rvalue of the expression using it.

An even more powerful approach would be to keep track of the reference on the original `term` value under the hood, and update it if necessary in a copy-on-write fashion. As I understand it, this would enable this code to be efficiently executed as well:

Yes, this is something we're already looking at doing in general for this kind of "existential" use of a protocol, precisely because of this kind of nested-type problem.

protocol Term {}

struct Leaf: Term {}
struct Tree: Term {
    let left: Term
    let right: Term
}

var tree: Term = Leaf()
for _ in 0 ..< 20000 {
    tree = Tree(left: tree, right: tree)
}

Interestingly, while enumerations are value types, indirect enumerations don’t suffer the performance issue of the above examples. Note however that the issue will reappear if the enum isn’t marked `indirect`.

protocol Term {}

indirect enum Nat: Term {
    case zero
    case succ(Term)
}

I do have to note that this is a very strange of writing Nat. Why recurse through a protocol type instead of recursing concretely?

John.

···

On Mar 13, 2017, at 8:55 AM, Dimitri Racordon via swift-evolution <swift-evolution@swift.org> wrote:

var term = Nat.zero
for _ in 0 ..< 20000 {
    term = .succ(term)
}

Thanks,
Dimitri Racordon

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Jean-Daniel) #3

If you want to understand the performance implication of using protocol for value type, you should watch this session.

https://developer.apple.com/videos/play/wwdc2016/416/

It is very instructive.

···

Le 13 mars 2017 à 13:55, Dimitri Racordon via swift-evolution <swift-evolution@swift.org> a écrit :

Hello fellow evolutionists,

I’m not sure this list is the best place to talk about this, so please redirect me if it should be discussed elsewhere.

While trying to implement algebraic data types in Swift, I stumble upon an interesting problem. Let’s consider the following example:

protocol Term {}

struct Zero: Term {}
struct Succ: Term {
    let pred: Term
}

var term: Term = Zero()
for _ in 0 ..< 20000 {
    term = Succ(pred: term)
}

This code will take forever to execute (about 1 minute on my mbp late 2013), whereas if I replace the structures with reference types it will finish almost instantly, which is what one would expect from such a simple data structure. Most likely, the problem lays in the fact that a deep copy is produced every time a new level of recursion is created. However, I guess this case could be detected by the compiler, as the `term` variable passed as parameter is immediately affected to the rvalue of the expression using it.

An even more powerful approach would be to keep track of the reference on the original `term` value under the hood, and update it if necessary in a copy-on-write fashion. As I understand it, this would enable this code to be efficiently executed as well:

protocol Term {}

struct Leaf: Term {}
struct Tree: Term {
    let left: Term
    let right: Term
}

var tree: Term = Leaf()
for _ in 0 ..< 20000 {
    tree = Tree(left: tree, right: tree)
}

Interestingly, while enumerations are value types, indirect enumerations don’t suffer the performance issue of the above examples. Note however that the issue will reappear if the enum isn’t marked `indirect`.

protocol Term {}

indirect enum Nat: Term {
    case zero
    case succ(Term)
}

var term = Nat.zero
for _ in 0 ..< 20000 {
    term = .succ(term)
}

Thanks,
Dimitri Racordon

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Dimitri Racordon) #4

More of a swift-dev topic. CC'ing there, BCC'ing evolution.

Thanks!
I’ll track the issue over there.

I do have to note that this is a very strange of writing Nat. Why recurse through a protocol type instead of recursing concretely?

My examples are extracted from a more complex codebase that requires such intricacies (https://github.com/kyouko-taiga/LogicKit for those who might be interested). The protocol is there to represent multiple kind of ADTs that might coexist in a substitution map.