What is the benefit of swaps over assignment

aliali · June 19, 2024, 7:50am

There is some code such as "swap(&immediateTasksCopy, &self._immediateTasks)" && "swap(&decoder, &self.decoder)."

What are the benefits of using swaps over assignment? How did you know to use this over assignment, is there some all beholding book?

lukasa · June 19, 2024, 8:57am

Each of these cases uses swap for a different reason. It's easiest to explain these by looking at the specific cases.

First, let's consider swap(&immediateTasksCopy, &self._immediateTasks) in its wider context:

defer {
    var iterations = 0
    var drained = false
    var scheduledTasksCopy = ContiguousArray<ScheduledTask>()
    var immediateTasksCopy = Deque<UnderlyingTask>()
    repeat { // We may need to do multiple rounds of this because failing tasks may lead to more work.
        self._tasksLock.withLock {
            // In this state we never want the selector to be woken again, so we pretend we're permanently running.
            self._pendingTaskPop = true

            // reserve the correct capacity so we don't need to realloc later on.
            scheduledTasksCopy.reserveCapacity(self._scheduledTasks.count)
            while let sched = self._scheduledTasks.pop() {
                scheduledTasksCopy.append(sched)
            }
            swap(&immediateTasksCopy, &self._immediateTasks)
        }

        // Run all the immediate tasks. They're all "expired" and don't have failFn,
        // therefore the best course of action is to run them.
        for task in immediateTasksCopy {
            self.run(task)
        }
        // Fail all the scheduled tasks.
        for task in scheduledTasksCopy {
            task.fail(EventLoopError.shutdown)
        }

        iterations += 1
        drained = immediateTasksCopy.count == 0 && scheduledTasksCopy.count == 0
        immediateTasksCopy.removeAll(keepingCapacity: true)
        scheduledTasksCopy.removeAll(keepingCapacity: true)
    } while !drained && iterations < 1000
    precondition(drained, "EventLoop \(self) didn't quiesce after 1000 ticks.")

    assert(self.internalState == .noLongerRunning, "illegal state: \(self.internalState)")
    self.internalState = .exitingThread
}

In this context, we are in a defer block inside our event loop run function. This means we're doing cleanup: we're stopping the EL. The swap occurs in a loop over our pending tasks, where we are pulling items out of our immediateTasks and our scheduledTasks.

The loop exists because our cleanup of these tasks may cause the user code to enqueue more tasks. We want to ensure we don't leak them, so we need to keep handling them until no further tasks exist.

Additionally, the two task lists are protected by a lock. We don't want to repeatedly take-and-drop that lock in order to pop from the tasks, so we take it and copy the elements out.

Our goal here is to minimise how costly this is, so we want to avoid allocating new task arrays all the time. If all we did was a literal shallow copy (let immediateTasksCopy = self._immediateTasks), any attempt to enqueue a new task would cause another heap allocation.

Instead, we create a single new tasks array, and swap it with the regular one. This is essentially free, and ensures that at each time there is only one reference to either the copy or the original. When we're done with the copy, we clear it out (keepingCapacity), and then swap it back with the original. This is the cheapest possible way to achieve this goal.

Why don't we do this with scheduledTasksCopy? Frankly, we probably should!

Next, let's look at swap(&decoder, &self.decoder). Again, in context:

var possiblyReclaimBytes = false
var decoder: Decoder? = nil 
swap(&decoder, &self.decoder)
assert(decoder != nil) // self.decoder only `nil` if we're being re-entered, but .available means we're not 
defer {
    swap(&decoder, &self.decoder)
    if buffer.readableBytes > 0 && possiblyReclaimBytes {
        // we asserted above that the decoder we just swapped back in was non-nil so now `self.decoder` must
        // be non-nil.
        if self.decoder!.shouldReclaimBytes(buffer: buffer) {
            buffer.discardReadBytes()
        }
    }
    self.buffer.finishProcessing(remainder: &buffer)
}
let decodeResult = try body(&decoder!, &buffer)

// If we .continue, there's no point in trying to reclaim bytes because we'll loop again. If we need more
// data on the other hand, we should try to reclaim some of those bytes.
possiblyReclaimBytes = decodeResult == .needMoreData
return .didProcess(decodeResult)

The goal here is to defend against re-entrant code. It is possible for us to end up calling into this function recursively. That's a problem, as the law of exclusivity will forbid us from touching the decoder again. To make that happen, we need to nil it out. The easiest way for us to do that is to swap a nil value into the existing space, and pull the current value to a temporary.

aliali · June 19, 2024, 9:09am

Thanks for the explanation. Honoured to get such an in-depth answer. Learnt a lot from this.

aliali · June 19, 2024, 9:16am

Good to learn more about the "Law of Exclusivity" and what "Re-entrancy" is.

I want to discuss this part a bit more. How is it a swap "essentially free?"

lukasa · June 19, 2024, 10:09am

A swap produces no long-lived temporary values, so the refcounts are stable before and after the operation. The operation itself also logically doesn't change the refcount of either value: it just replaces the bytes stored in the first location with those in the second location, and vice-versa.

As an example, consider the optimized compilation of the swap of two arrays:

func rearrange(_ first: inout [Int], _ second: inout [Int]) {
    swap(&first, &second)
}

generates the following x86 assembly:

output.rearrange(inout [Swift.Int], inout [Swift.Int]) -> ():
        mov     rax, qword ptr [rdi]
        mov     rcx, qword ptr [rsi]
        mov     qword ptr [rdi], rcx
        mov     qword ptr [rsi], rax
        ret

Here we have no refcount operations and no allocations. All we do is copy to two registers, then write the registers back to the opposite locations. This is as cheap as it gets, and given that the values are going to be in cache, we can consider it to be essentially free.

This is quite different to actually creating a temporary, which must emit at least a refcount operation.

aliali · June 19, 2024, 10:10am

Beautiful. Thank you for this wisdom. And this compiler link!

wadetregaskis · June 19, 2024, 3:44pm

As others noted, swap is usually faster than anything involving temporary variables because the compiler treats it as "atomic" from an ARC perspective, and thus avoids inserting pointless retains and releases (among a few other optimisations).

Note though that swap isn't always the fastest way to swap two values. Sometimes it's faster to do:

(a, b) = (b, a)

Apparently the compiler recognises this specially and knows how to avoid unnecessary work (like retain-release activity) even for complex types. This method may also be more broadly applicable as complex ownership semantics develop in the Swift language.

From what I've seen so far the difference is small, so not something I suggest you worry about, generally. But if you do see swap taking a non-trivial amount of time in a hot path, you can try the tuple swap method instead and see if it happens to help in that specific situation.

Nobody1707 · June 19, 2024, 5:06pm

You also get the same result with an exchange function using the new ownership features.

func exchange<T: ~Copyable>(_ lhs: inout T, _ newValue: consuming T) -> T {
    let old = consume lhs
    lhs = newValue
    return old
}

func rearrange(_ first: inout [Int], _ second: inout [Int]) {
    second = exchange(&first, consume second)
}

Produces the same machine code as if you had done (first, second) = (second, first), and as a slight bonus produces no retains on -Onone, whereas the (first, second) = (second, first) approach has two retain/release pairs on -Onone. As you can see here.

Slava_Pestov · June 19, 2024, 5:33pm

How about this?

func exchange<T: ~Copyable>(_ lhs: inout T, _ rhs: inout T) {
    let old = consume lhs
    lhs = consume rhs
    rhs = old
}

Nobody1707 · June 19, 2024, 5:39pm

Well, that would just be swap. But yes, I do believe that is a valid implementation.

Also, I only just thought of it, but since T is ~Copyable, you probably don't need the explicit consumes at all.

func swap<T: ~Copyable>(_ lhs: inout T, _ rhs: inout T) {
    let old = lhs
    lhs = rhs
    rhs = old
}

Should be sufficient.