Improving linked list performance (_swift_release_ and _swift_retain_ overhead)

measuredweighed · October 20, 2018, 4:18pm

I'm in the process of porting a C++ library that relies heavily on the use of a doubly-linked list to Swift and have run into a performance issue relating to swift_release(swift::HeapObject*) and swift_retain(swift::HeapObject*) overhead. Specifically, Time Profiler reports a spend of around 770ms in my main library function, but reports that 561ms of that time is spent in _swift_release/retain.

A few notes on the code included below: the library in question is responsible for polygon tessellation. The port has tried to remain as faithful to the original library as possible for clarity, and as such isn't particularly idiomatic Swift code.

Here's my linked list Node class:

final class Node : Equatable {
    // vertex index in coord array
    var i:Int
    
    // vertex coordinates
    var x:Scalar
    var y:Scalar
    
    // z-order curve value
    var z:Scalar? = nil
    
    // previous and next nodes in z-order
    weak var prevZ:Node? = nil
    var nextZ:Node? = nil
    
    // previous and next vertex nodes in a polygon ring
    weak var prev:Node? = nil
    var next:Node? = nil
    
    // indicates whether this is a steiner point
    var steiner:Bool = false
    
    init(i:Int, x:Scalar, y:Scalar) {
        self.i = i
        self.x = x
        self.y = y
    }
    
    static func == (lhs: Node, rhs: Node) -> Bool {
        return lhs.x == rhs.x && lhs.y == rhs.y
    }
}

And here's an example of a library function that incurs a strong retain/release overhead. The code is fairly heavy for demonstration purposes, but it appears from watching a few Swift performance related WWDC sessions that a lot of this overhead comes from the reference semantics related to the use of Optional values for the next/prev variables (and iterating over them), but I'm not sure how this can be avoided in Swift.

// Simon Tatham's linked list merge sort algorithm
// http://www.chiark.greenend.org.uk/~sgtatham/algorithms/listsort.html
private func sortLinked(_ _list:Node) -> Node {
    
    var list:Node? = _list
    var tail:Node?
    var e:Node?
    var p:Node?
    var q:Node?
    var qSize:Int = 0
    var pSize:Int = 0
    var inSize:Int = 1
    var numMerges:Int = 0
    
    repeat {
        p = list
        list = nil
        tail = nil
        numMerges = 0
        
        while p != nil {
            numMerges += 1
            q = p
            pSize = 0
            for _ in 0 ..< inSize {
                pSize += 1
                q = q!.nextZ
                if q == nil { break }
            }
            qSize = inSize
            
            while pSize > 0 || (qSize > 0 && q != nil) {
                
                if (pSize != 0 && (qSize == 0 || q == nil || p!.z! <= q!.z!)) {
                    e = p;
                    p = p!.nextZ;
                    pSize-=1;
                } else {
                    e = q;
                    q = q!.nextZ;
                    qSize-=1;
                }
                
                if tail != nil {
                    tail!.nextZ = e
                } else {
                    list = e
                }
                
                e!.prevZ = tail
                tail = e!
            }
            
            p = q
        }
        
        tail?.nextZ = nil
        inSize *= 2
    } while numMerges > 1
    
    return list!
}

Hopefully someone here can point me in the right direction with regards to how to improve release/retain when iterating over the linked list. Were this C or C++ such pointer manipulation would essentially be free (and admittedly more dangerous). How can I achieve similar performance in Swift?

taylorswift · October 21, 2018, 5:30pm

have you tried implementing it with a struct and UnsafePointer?

Michael_Gottesman · October 21, 2018, 7:39pm

I need to take a more in depth look at your algorithm itself (I may have a few ideas). Keep in mind that I am finishing up semantic sil right now which should help in many cases with ARC performance.

That being said, for immutable iteration over a linked list (or any non-cyclic graph really), there is an optimization that we could do based off of shape optimizations (I wanted to implement this a number of years ago but never had time). As a simple example consider:

final class Node {
   var next: Node? = nil
   var value: Int
}

func findValue(target: Int, list: Node) -> Node? {
    var h = list
    while let h2 = h.next {
       if h2.value == target {
          return h2
       }
       h = h2
    }
    return nil
}

The key thing to notice here is that based off the shape of Node, we know that if we retain the first element of the iteration then the rest of the list will stay alive. So we can just insert a retain before the loop and a release afterwards.

measuredweighed · October 21, 2018, 8:08pm

I've posted the library to Github in it's current state. Included are a number of unit tests if you'd like to see the results of a profiling run. As mentioned in the initial post I've tried to port the library with as few changes as possible, so it's a fairly dense chunk of code.

That said, a Time Profile should highlight the issue fairly quickly. On my machine I see a good ~75-80% retain/release overhead for most functions responsible for walking the linked list. As such, I'd be interested in any suggestions when it comes to reducing ARC overhead without massive restructuring.

Since posting I've also stumbled across this optimization tips document which suggests the use of Unmanaged to avoid such overhead. One thing worth noting (which is advantageous when it comes to the Unmanaged approach) is that the original library doesn't appear to clean-up the linked list structures it creates. In fact, the official C++ port of the library (and my Swift port) maintain a collection of all Node instances allocated so that their references can be nil'd on completion. While it's ugly, it does mean that any use of Unmanaged wouldn't have to worry about object lifetime, as the collection keeps a strong reference to all Node's until tessellation is complete.

measuredweighed · October 24, 2018, 4:13pm

@Michael_Gottesman I'd be interested in hearing your thoughts, as I'm currently stumped as to how best to proceed (short of a large scale refactor, or use of another language).

Michael_Gottesman · October 24, 2018, 6:20pm

I thought that you had found your solution by using Unmanaged et al. I have some other things on my plate right now so it will take me a little bit to get to this.

My suggestion: Can you include your benchmark as a single source benchmark in the swift benchmark suite in a PR? Then we can track the workload. Once it is there, we can both look at it together and perhaps come up with an unsafe variant that works better.