Why is swapAt not using swap?

wadetregaskis · May 15, 2024, 12:21am

I noticed that swapAt uses the naive approach with a named temporary variable. It doesn't even do a tuple swap ((a, b) = (b, a)). Usually the naive approach is dangerous for performance as it relies on the optimiser recognising the swap semantics, which it sometimes does not do. Is there something special about swapAt that somehow ensures the optimiser will work on it?

My questioning is somewhat academic right now - I stumbled onto this while debugging performance issues involving swap - but it seems like there's been problems in the past with swapAt not being properly performant.

I assume it can't just use swap currently because of the limitations in exclusivity checking, as mentioned in SE-0176: Enforce Exclusive Access to Memory, which oddly weren't addressed in SE-0173: Add MutableCollection.swapAt(_:_:). But there are methods known involving unsafe pointers and swapping the bytes manually; is there a reason that's not used in the canonical implementation?

jrose · May 15, 2024, 1:23am

A fully general MutableCollection might do a get and set for each subscript operation, so using a tuple isn’t going to be any better in practice, even if it looks tidier. Anywhere where that’s not true because of inlining, the compiler would be smart enough to treat the two forms equivalently.

Swapping bytes isn’t good enough because Swift supports types with non-trivial move operations (C++ types, but also structs containing ObjC weak references).

Something involving explicit move operations (with consume or with something else) would require being able to get mutable access to both parts of the collection at once, which may not even be possible—consider a MutableCollection whose elements are bits in a UInt64.

I’m not saying having a fast path here wouldn’t be worth it, but it would still be conditional, and in most such cases the hope is that the naive swap would be inlined and optimized instead. I’d want to see a specific use case that would be sped up that couldn’t be done just as well by making the collection’s subscript inlinable.

wadetregaskis · May 15, 2024, 1:56am

Ah, I'd forgotten about the types that don't support trivial moves.

Perhaps BitwiseCopyable could help somewhat by permitting more specialised implementations?

How does swap handle all that, then? It's just using some compiler built-ins, and nothing about those (nor how it uses them) suggests to me it's special-casing all these things. And nothing in its signature appears to prevent it being used in all those cases…?

I don't have a test case like that handy, but given that the compiler generally can't seem to optimise the naive swap method (named temporary variable) correctly, in most non-trivial code, I don't have a lot of confidence in that.

Not to undermine that statement too much, but I did try to quantify this a bit with some benchmarks earlier today, and was annoyed to find that the optimiser mostly did work well in trivial benchmarks.

(a subsequent line of inquiry is why the optimiser can't seem to ever optimise 'swaps' out entirely, using a toggle pointer instead where applicable?)

Joe_Groff · May 15, 2024, 2:27am

It seems to do so to me:

As one would expect, this optimizes to allocating one stack slot for c (alloc_stack), and then doing three moves to shift the values around (the copy_addr [take] to [init] instructions), which is about as good as you can expect generically. It looks like the use of builtins by the standard library to get this code generation is no longer necessary, and if anything might be confusing the optimizer at this point.

If there's anything special about swap, it's that it takes both of its arguments inout, which ensures that the function has exclusive access to both operands. The externalities Jordan talks about (computed property get/sets, class ivar aliasing, etc.) get externalized because it's the caller's responsibility to deal with those in order to set up the inout parameter. If you're only dealing with local variables, though, there shouldn't be anything preventing the compiler from optimizing the same thing if you write it inline.

tera · May 15, 2024, 2:31am

Could this break?

class C {}

struct S {
    weak var ref: C?
}
var c: C? = C()
var a = S(ref: c)
var b = S(ref: nil)

// swap the two values non atomically
func swap(_ a: UnsafeMutableRawPointer, _ b: UnsafeMutableRawPointer, _ count: Int) {
    let a = a.assumingMemoryBound(to: UInt8.self)
    let b = b.assumingMemoryBound(to: UInt8.self)
    for i in 0 ..< count {
        let temp = a[i]
        a[i] = b[i]
        b[i] = temp
    }
}

print(a, b)   // a: S(ref: Optional(C)), b: S(ref: nil)
swap(&a, &b, MemoryLayout<S>.size)
print(a, b)   // a: S(ref: nil), b: S(ref: Optional(C))

This seems to be working alright. Could you come up with an example that won't work?

Joe_Groff · May 15, 2024, 2:33am

Changing it to class C: NSObject, or the type of ref from C to AnyObject, would break it, because ObjC weak references are not bitwise-movable. You might not notice the brokenness right away with a swap, because there will happen to be weak references to objects in the same places at the end of it, but they will be associated with the wrong objects and may get nulled out unexpectedly when the wrong object expires.

jrose · May 15, 2024, 2:39am

It does have to be ObjC weak, meaning a class that uses ObjC refcounting. And the zeroing part of weak has to actually kick in!

import Foundation  // NEW

class C: NSObject {}  // CHANGED

struct S {
    weak var ref: AnyObject?  // CHANGED
}
var c: C? = C()
var a = S(ref: c)
var b = S(ref: nil)

// swap the two values non atomically
func swap(_ a: UnsafeMutableRawPointer, _ b: UnsafeMutableRawPointer, _ count: Int) {
    let a = a.assumingMemoryBound(to: UInt8.self)
    let b = b.assumingMemoryBound(to: UInt8.self)
    for i in 0 ..< count {
        let temp = a[i]
        a[i] = b[i]
        b[i] = temp
    }
}

print(a, b)   // okay
swap(&a, &b, MemoryLayout<S>.size)
print(a, b)   // "okay"
c = nil
print(a, b)   // crash!

tera · May 15, 2024, 2:43am

Wow, thanks, it crashes indeed. I wonder how does that happens at the bit level as one would naively assume that moving bytes from one place to another should not change anything meaningfully (unless there is, say, a memory address is getting stored somewhere (in the object itself or in an external table) or some such).

Joe_Groff · May 15, 2024, 2:45am

That's exactly how ObjC weak references are implemented. There's no room in the ObjC ABI for additional storage to track weak references, since they were retrofitted later when ARC was introduced, so every time one is created, copied, or moved, a table in the ObjC runtime needs to be updated which tracks the locations of all of the weak references and their referents so that it can zero them out when the referent is deallocated.

hassila · May 15, 2024, 3:25am

Related, another issue when we got performance issues with swapat in a trivial setup with an array of ints:

github.com/apple/swift

[SR-14778] Standard Library swapAt: creates transient memory allocations for small value types

opened 09:43AM - 15 Jun 21 UTC

hassila

bug

| | | |------------------|-----------------|… |Previous ID | SR-14778 | |Radar | rdar://problem/79351572 | |Original Reporter | @hassila | |Type | Bug | <details> <summary>Environment</summary> `swift-driver version: 1.26 Apple Swift version 5.5 (swiftlang-1300.0.19.104 clang-1300.0.18.4)``Target: x86_64-apple-macosx11.0` </details> <details> <summary>Additional Detail from JIRA</summary> | | | |------------------|-----------------| |Votes | 0 | |Component/s | | |Labels | Bug | |Assignee | None | |Priority | Medium | md5: e7f45b6162fac2aa35f8ffcfb2ac117b </details> **Issue Description:** When analysing runtime performance of the new PriorityQueue implementation being proposed for Swift Collections, we could see that we had a massive amount of unexpected mallocs when running one of the benchmarks, which could be traced down to swapAt: in an array of Int:s. I forked the relevant repo and have created an additional branch that fixes it by just doing the swap manually as an inlined function: <https://github.com/hassila/swift-collections/commit/b5cc85f26f8b4d79f1d0e8d490480457c18f9c90> This was also discussed at the Swift forums as it caused issues for reference types (excessive ARC traffic - <https://forums.swift.org/t/unexpected-swapat-free-malloc-overhead/41109/13),> but I think this case when small value types creates two memory allocations per swap is more problematic. To reproduce, just clone: <https://github.com/hassila/swift-collections> checkout the 'priority-queue' branch and run the swift-collections-benchmark scheme with the arguments "run benchmark --cycles 1 -f "PriorityQueue\<Int\> insert"" run it in instruments with the allocations profiling template and look at the transient allocations (several millions) generated by swapAt:. `then checkout the 'priority-queue-optimizations' branch which has the trivial fix and rerun the profiling session and the transient allocations from swapAt are gone.` You can find more information here also in a couple of comments: <https://github.com/apple/swift-collections/pull/51#issuecomment-860469370> These unexpected mallocs basically doubled the runtime.

(Referenced from [Heap] Figure out what exactly makes `Array.swapAt` slow · Issue #72 · apple/swift-collections · GitHub )

tera · May 15, 2024, 2:57pm

Using a quick test I can see that when the referred object is deallocated, Obj-C weak references are zeroed right away, while Swift weak references are left having their original non-zero bit pattern, just treated as if they were nil (e.g. when binding with if let, etc). Swift way of doing weak references looks more solid to me. It's quite unfortunate Obj-C weak references were done differently initially and weren't able to be retrospectively changed to match Swift weak references implementation.

John_McCall · May 15, 2024, 4:30pm

I believe the hard constraint is that there's a significant amount of code that reads Objective-C properties reflectively without paying attention to ownership, so a simple load of the ivar needs to return the actual object. For weak references, this is obviously racy with the deallocation of the object, but it's the sort of thing you can get away with in specific patterns as long as you're just reading from the property. Writes are a different matter entirely — even without weak, writes in Objective-C required knowledge about ownership because of the prevalence of assign properties.

grynspan · May 31, 2024, 12:33pm

Just a reminder that here be dragons. You don't want assumingMemoryBound(to:), you want withMemoryRebound(to:capacity:_:).