Efficiency of enum-based tree?

Consider a simple tree of integers (I will use a class, rather than a struct, because this example is deliberately simplified from the real problem):

clase Node {
    enum Contents {
       case leaf(Int)
       case nonleaf([Node])
   }
   let contents: Contents
}

So here's the question: whenever I want to traverse the children of a node, we end up doing something like this:

    switch(node.contents) {
    case let .leaf(n):
         ...
    case let .nonleaf(children):
       for child in children {
           ...
       }
    }

I worry that the case let statement is always having to make a local copy of the array, just to iterate through it. Yes, COW should be doing its thing and all, but still, it bothers me a bit I can't directly get an interator to the underlying array of nodes being held without first declaring a local copy (i.e. the variable children).

Is there a better way? Should I just stop worrying because under the hood this is being made efficient anyway?

1 Like

This is happening all the time, whether inside a case let block or not — but this copy is not the contents of the array, it's essentially just a pointer to the contents. The contents themselves will not be copied unless you mutate the array, but given that you use a case let instead of case var, this will not happen.

FYI, you can get the iterator and iterate manually by doing this, if you wish:

case let .nonleaf(children):
       var iterator = children.makeIterator()
       while let next = iterator.next() {
           // ...
       }
    }
1 Like

Not that it matters in my case, but in general I just realized that if I wished to mutate the children array after Node has been created, I have no choice but to do so by explicitly reassigning to the content variable of the node.

There is in general no way of doing

case var .nonleaf(children):
    // operation which modifies children is allowed, but will not affect
    // the content variable.
    children.append( ) // not very useful

Instead, I would need to do:

    case var .nonleaf(children):
        // modify children
        content = .nonleaf(children)

Right?

Yes, because there's currently no modify syntax for enums with associated values.

And here, I believe, you are out of luck with avoiding copies: because the enum is a separate entitiy storing the array, at the moment when you mutate children, it will need to copy it into a local variable, retain it and thus unavoidably trigger COW, even though you will reassign this array back some moments later. The optimizer seems to not catch these cases as of now.

So, to be clearer: iterating over let children is OK since there is no "write" as in Copy-on-Write, but modifying var children will trigger a copy.

Given that we’ve had to make a copy, I don’t think the optimizer even could be made to catch the case. You’d need to add explicit “modifying” syntax (the “ownership manifesto”?) to enums somehow.

The instant you make a new local variable that refers to the original and then make a modification, you have no choice but to copy, no matter how smart the optimizer might be, because of multithreading concerns.

So there’s no easy way to avoid quadratic behavior if you’re, saying, appending one child at a time to your tree.
(Well, not that you want to have a tree with a gazillion child nodes, but still.)

Yikes!! Would the mythical “modify” stuff (part of the ownership manifesto?) possibly give us a way to mutate in place for enums?

Perhaps this actually is a bad design. I’ve just said to my users: “appending a child element is an O(n) operation.”

This is actually a much bigger concern that just the efficiency of reading back.

It looks like you can avoid triggering a copy by writing mutating methods directly on the enum like this:


enum Contents {
    case leaf(Int)
    case nonleaf([Node])

    mutating
    func appendIfAble(node: Node) -> Bool {
        switch self {
        case .leaf:
            return false
        case .nonleaf(var children):
            children.append(node)
            self = .nonleaf(children)
            return true
        }
    }
}

— granted, it's not very convenient to rewrite all array methods that you need, but this seems to solve the problem.

Please correct me if I'm wrong, but you can avoid the copy entirely by setting the content to a placeholder like .leaf(-1) (or just .nonleaf([]) if you can't afford one of the integer values for this) before modifying children and then set it back on content.

Taking @davidbaraff `s example:

case var .nonleaf(children):
        content = .leaf(-1) // or .nonleaf([]) since it should be stack allocated anyway
        // modify children any way you want
        content = .nonleaf(children)

To make your life a little easier you could even encapsulate that into an internal function that takes a closure to modify the value without triggering COW.

Indeed! One can verify by running

enum Contents {
    case trivial(Int)
    case nontrivial(Test)
}

class Test {
    var contents: Contents = .trivial(1)
    
    func foo() {
        switch contents {
        case .trivial:
            break
        case .nontrivial(var test):
            contents = .trivial(1)
            print(isKnownUniquelyReferenced(&test)) // will print `true`
            contents = .nontrivial(test)
        }
    }
    
    func bar() {
        switch contents {
        case .trivial:
            break
        case .nontrivial(var test):
            print(isKnownUniquelyReferenced(&test)) // will print `false`
        }
    }
}


let test = Test()
test.contents = .nontrivial(Test())
test.foo() // prints `true`
test.bar() // prints `false`

Might be slightly less robust, for example, if there's no enum case that is cheap enough to construct and throw away or if there are property observers on var contents, since they will be triggered twice — but works too!


It perhaps should be noted for the sake of completeness that this is not really enum-specific, as one can write much more ordinary code that unnecessarily triggers COW:

class Test {
    
    var arr = [1,2,3]
    
    // will trigger COW
    func foo() {
        var copy = arr
        copy.append(4)
        arr = copy
    }
    
    // will not trigger COW
    func bar() {
        arr.append(4)
    }
}

Thanks very much for the suggested work around, in case I need it.

I would argue strongly however that the need for this (and indeed, how easy it would be to fall into quadratic behavior here) argues that direct mutating access to the underlying contents is required in some form.

These work arounds point to there being something missing.

While I dearly love Swift and can’t imagine going back to other languages, the newness of the language and the rough edges of things we take for granted in older languages still surprises me. (On the other hand, I was around when C++ was born. 5 years after it was born, it was still a toddler, while swift is most assuredly not.)

1 Like

@wtedst Good news: there appears to be progress on this front! I filed this as SR-10605 almost two years ago and we've been pretty consistently working around it ever since. However, the discussion on the issue notes that the OSSA functionality added to Swift has made it possible to safely add the optimisation needed to make this code faster.

While Swift 5.3 is missing this optimisation, the current nightly build contains it. You can grab a copy of the nightly toolchain from swift.org or, if you're familiar with Docker, you can grab it that way (docker pull swiftlang/swift:nightly-focal). Running your example with the nightly toolchain prints:

true
true

The same is true for my (slightly more complex) example from the upstream issue.

11 Likes

@lukasa good to hear!

For what it's worth, you can also wrap your array in a class to avoid triggering COW. That way, only a single instance of the class will ever have a reference to the array. Something like:

public class MutableList<Element> {
    public var array: [Element]
    // ...
}

class Node {
    enum Contents {
       case leaf(Int)
       case nonleaf(MutableList<Node>)
   }
   let contents: Contents
}

// ...
case var .nonleaf(children):
    children.append( ) // No COW

I have a working implementation of this class here in case anyone's interested.

2 Likes
Terms of Service

Privacy Policy

Cookie Policy