I've been profiling some Swift code and noticed what seemed like excessive calls to outlined init with take of ... in Time Profiler.
I've tried to reduce my code to something that easily demonstrates the issue:
enum FruitCount {
case apples(Int)
case oranges(Int)
}
struct FruitComparator {
let fruitCount: FruitCount
let padding1 = 0, padding2 = 0, padding3 = 0
func isApplesFast() -> Bool {
switch fruitCount {
case .apples: true
default: false
}
}
func isApplesFast2() -> Bool {
if case .apples = fruitCount { true }
else { false }
}
// These all generate "outlined init with take of FruitCount" calls, not once, but twice:
func isApplesSlow() -> Bool {
switch fruitCount {
case .apples where padding1 == 1: true
default: false
}
}
func isApplesSlow2() -> Bool {
switch fruitCount {
case .apples: padding1 == 1
default: false
}
}
func isApplesSlow3() -> Bool {
switch fruitCount {
case .apples: print("hello!"); return true
default: return false
}
}
}
In the code above:
When FruitComparator exceeds a certain size (hence the padding properties) and
any switch ... case statement over fruitCount has a non-empty body
Swift makes a copy of fruitCounttwice. You can see this in the generated assembly as the outlined init with take of FruitCount calls.
On the other hand, the fast variations seem to be correctly optimized, and the compiler correctly observes it can just check which enum tag fruitCount is.
I'm not an expert here by any means, so I'd love clarification on whether or not this is a bug (to me it seems to be) or a known bug, and what might be happening here. It seems(...?) like Swift is incorrectly taking the route of preemptively and excessively creating local copies of fruitCount (even when there aren't any associated values bound) to avoid issues around what might happen to fruitCount in the body of the case statement (although honestly even for that I'm not sure why two outlined init with take calls get generated, instead of just one?) It's also unclear to me why the size of FruitComparator matters.
Thank you — that does seem to be a similar issue! Maybe the optionality matters in that case due to how it influences the size of the containing struct.
Checking out the pull request though... it seems like it improves the performance of the copy, but it doesn't seem to eliminate it in the cases where it can and copies can be avoided.
Not directly related to this specific situation… but I do believe it is a known performance tradeoff with passing and copying large value types. This is why engineers might choose reference semantics (making FruitComparator a class) or a copy-on-write data-structure (making FruitComparator wrap a class reference).
This does make me wonder — enums (and structs) are such a common and valuable data type, and it feels like a bit of a hole that it's so easy for read-only operations on them and their associated values to introduce unanticipated copy overhead.
yeah… this is true… there's always a trade off when shipping infra: to what extent should the infra engineer expose these implementation details? should a product engineer know this implementation performs extra copy-by-value… should a product engineer optimize their product on the expectation this implementation performs extra copy-by-value? is the infra engineer then blocked on optimizing the infra because product engineers have already hard-coded their products expecting the legacy implementation?
i'm pretty new to swift… didn't start focusing here until 2020… my impression is for the most part that product engineers don't optimize for infra implementation details until they are way past the "v1 mvp" stage of things.
for the most part the impression i get from engineering in swift is that product engineers should choose semantics first and performance second. if a product engineer wants value semantics they should choose value types. if at some point in the future that engineer needs to optimize performance, that's when they can weigh the tradeoffs of moving to reference semantics or a copy-on-write value type structure.
That being said… if you do discover something that looks like an unambiguous performance win… I think the community would be happy if you helped ship a diff to help fix that. The tricky part is when a performance win is actually a performance tradeoff… and the community worked through that situation and decided that the current solution was the "least bad" solution for now.
If you were really interested in diving deeper and researching into this specific topic… you could try and find out if this problem was explicitly fixed in a diff targeted on this problem or was implicitly fixed in a diff targeted on a different problem. This might mean that automated tests are missing which could catch a regression if some code changes in the future and brings this problem back.
I wrote a property wrapper that works around this issue and the compiler is able to optimize into a faster set of instructions, based on a solution discovered by@Karl, assuming (like me) you're stuck working with the current version of Swift that has this issue and not the nightly, which seems to have largely fixed it.