Swift tanking performance by excessively copying enums when pattern matching?

mattcurtis · January 26, 2025, 9:40pm

I've been profiling some Swift code and noticed what seemed like excessive calls to outlined init with take of ... in Time Profiler.

I've tried to reduce my code to something that easily demonstrates the issue:

enum FruitCount {
    
    case apples(Int)
    case oranges(Int)
    
}

struct FruitComparator {
    
    let fruitCount: FruitCount
    let padding1 = 0, padding2 = 0, padding3 = 0

    func isApplesFast() -> Bool {
        switch fruitCount {
            case .apples: true
            default: false
        }
    }

    func isApplesFast2() -> Bool {
        if case .apples = fruitCount { true }
        else { false }
    }

    // These all generate "outlined init with take of FruitCount" calls, not once, but twice:
    
    func isApplesSlow() -> Bool {
        switch fruitCount {
            case .apples where padding1 == 1: true
            default: false
        }
    }

    func isApplesSlow2() -> Bool {
        switch fruitCount {
            case .apples: padding1 == 1
            default: false
        }
    }

    func isApplesSlow3() -> Bool {
        switch fruitCount {
            case .apples: print("hello!"); return true
            default: return false
        }
    }
    
}

In the code above:

When FruitComparator exceeds a certain size (hence the padding properties) and
any switch ... case statement over fruitCount has a non-empty body

Swift makes a copy of fruitCount twice. You can see this in the generated assembly as the outlined init with take of FruitCount calls.

On the other hand, the fast variations seem to be correctly optimized, and the compiler correctly observes it can just check which enum tag fruitCount is.

I'm not an expert here by any means, so I'd love clarification on whether or not this is a bug (to me it seems to be) or a known bug, and what might be happening here. It seems(...?) like Swift is incorrectly taking the route of preemptively and excessively creating local copies of fruitCount (even when there aren't any associated values bound) to avoid issues around what might happen to fruitCount in the body of the case statement (although honestly even for that I'm not sure why two outlined init with take calls get generated, instead of just one?) It's also unclear to me why the size of FruitComparator matters.

vanvoorden · January 26, 2025, 9:56pm

There might be some clues in here… but it looks like that landed after @Karl was specifically seeing performance problems with an optional enum.

mattcurtis · January 26, 2025, 10:11pm

Thank you — that does seem to be a similar issue! Maybe the optionality matters in that case due to how it influences the size of the containing struct.

Checking out the pull request though... it seems like it improves the performance of the copy, but it doesn't seem to eliminate it in the cases where it can and copies can be avoided.

RandomHashTags · January 26, 2025, 10:58pm

That looks like a bug. It also looks fixed in the Swift nightly build ^{I rarely work with assembly so I could be wrong}.

mattcurtis · January 26, 2025, 11:07pm

You're right, it does seem to be! I feel silly now that I didn't think to check that.

vanvoorden · January 26, 2025, 11:26pm

Not directly related to this specific situation… but I do believe it is a known performance tradeoff with passing and copying large value types. This is why engineers might choose reference semantics (making FruitComparator a class) or a copy-on-write data-structure (making FruitComparator wrap a class reference).

mattcurtis · January 27, 2025, 12:20am

This does make me wonder — enums (and structs) are such a common and valuable data type, and it feels like a bit of a hole that it's so easy for read-only operations on them and their associated values to introduce unanticipated copy overhead.

vanvoorden · January 27, 2025, 6:46pm

yeah… this is true… there's always a trade off when shipping infra: to what extent should the infra engineer expose these implementation details? should a product engineer know this implementation performs extra copy-by-value… should a product engineer optimize their product on the expectation this implementation performs extra copy-by-value? is the infra engineer then blocked on optimizing the infra because product engineers have already hard-coded their products expecting the legacy implementation?

i'm pretty new to swift… didn't start focusing here until 2020… my impression is for the most part that product engineers don't optimize for infra implementation details until they are way past the "v1 mvp" stage of things.

for the most part the impression i get from engineering in swift is that product engineers should choose semantics first and performance second. if a product engineer wants value semantics they should choose value types. if at some point in the future that engineer needs to optimize performance, that's when they can weigh the tradeoffs of moving to reference semantics or a copy-on-write value type structure.

That being said… if you do discover something that looks like an unambiguous performance win… I think the community would be happy if you helped ship a diff to help fix that. The tricky part is when a performance win is actually a performance tradeoff… and the community worked through that situation and decided that the current solution was the "least bad" solution for now.

vanvoorden · January 27, 2025, 6:48pm

If you were really interested in diving deeper and researching into this specific topic… you could try and find out if this problem was explicitly fixed in a diff targeted on this problem or was implicitly fixed in a diff targeted on a different problem. This might mean that automated tests are missing which could catch a regression if some code changes in the future and brings this problem back.

mattcurtis · February 14, 2025, 5:30pm

I wrote a property wrapper that works around this issue and the compiler is able to optimize into a faster set of instructions, based on a solution discovered by @Karl, assuming (like me) you're stuck working with the current version of Swift that has this issue and not the nightly, which seems to have largely fixed it.

@propertyWrapper
struct ExpensiveValue<Value> {
    
    var wrappedValue: Value
    
    var projectedValue: Value { optimizableFastGet() }
    
    func optimizableFastGet() -> Value {
        //  Figure out the byte offset of Value, and return a copy of it
        
        var copy = self
        
        return withUnsafeMutablePointer(to: &copy) {
            copyPtr in
            
            let rawCopyPtr = UnsafeRawPointer(copyPtr)
            
            //  MemoryLayout.offset(of:) would give us the value's byte offset too.
            //  The downside of that method is that it doesn't work well with generic types,
            //  when the compiler can't directly "see" what property the key path references
            //  and instead dynamically generates a key path at runtime, which is _expensive_.
            //
            //  Honestly Swift's algorithm for the memory layout of structs means that
            //  wrappedValue will always be stored at byte offset 0,
            //  but that could change and therefore can't be relied upon.
            
            let byteOffset = withUnsafeMutablePointer(to: &copyPtr.pointee.wrappedValue) {
                rawCopyPtr.distance(to: $0)
            }
            
            //  Safety checks
            
            assert(
                MemoryLayout<Self>.offset(of: \.wrappedValue) == byteOffset,
                "wrappedValue stored outside of self!"
            )
            
            precondition(
                (0..<MemoryLayout<Self>.size).contains(byteOffset),
                "wrappedValue stored outside of self!"
            )
            
            return rawCopyPtr.load(fromByteOffset: byteOffset, as: Value.self)
        }
    }
    
}