I have a performance regression in my code after updating from Xcode 13.4 to Xcode 14. In this case performance is reduced by 90%, compared to pre Swift 5.7.
The code is a test project I set up to better understand the usage of async bytes handling (like in Foundation's FileHandle.AsyncBytes or SwiftAlgorithm's AsyncBufferedByteIterator).
In the project I'm parsing JSON elements from a UTF8 byte stream. There is some amount of nested async calls that are used in a tight loop. A unit test that parses a 25MB JSON file takes 0.521 s on Xcode 13.4 and 4.808 s on Xcode 14. In Instruments I'm now seeing a lot of time spent in swift_task_switch. This does not happen in Xcode 13.4.
I believe the regression is a result of SE-0338: Execution of Non-Actor-Isolated Async Functions. As far as my understanding goes, as part of the proposal, each function, when returning, has to switch back to the executor from which it was called. To test the theory that the regression stems from executor switching I added the @_unsafeInheritExecutor attribute on each nested function. With this change in place the code runs as fast as before.
Here's the test project. Just run the unit test named testAsyncCountLargeFile (in release mode) and look at the console to see the execution time.
To be quite honest; FileHandle.AsyncBytes and AsyncBufferedByteIterator are nearly identical - the only difference is that AsyncBytes uses an actor to coordinate the read calls. It seems like you are using an actor as well, so the perf comparison being the same is not surprising. But what is surprising is that we would have some sort of massive regression - is that true in release mode?
In off-forum discussions with @nikolai.ruhe and profiling in Instruments, we made these observations:
When run in Swift 5.7 (Xcode 14.0b2), the stack traces contain a lot of swift_task_switch calls, and also completeTaskWithClosure calls deep within the call stacks (i.e. I'm not talking about the top-most completeTaskWithClosure call that kicks of the work).
When run in Swift 5.6 (Xcode 13.4.1), these don't exist.
This is our rough hypothesis (totally not sure if this is correct):
SE-0338 stipulates that every return from a non-actor isolated async function is a potential executor hop. So the compiler must insert a swift_task_switch call on every return of an async function (is this true?).
Since SE-0338 is new in Swift 5.7, this would explain why Swift 5.6 is unaffected. Another piece of evidence is that that @_unsafeInheritExecutor also fixes the performance issue.
Apparently the optimizer isn't able to eliminate the dynamic executor checks, even though the relevant code is all in-module and mostly fileprivate. I'm not sure if the optimizer can be improved or if this is a fundamental limitation of the semantics of SE-0338.
The optimizer certainly ought to be able to remove the switch from that code. Maybe it doesn't have the ability to reason that some operations don't require it to be on some specific executor. In general, it's supposed to be conservative about executor assumptions, but obviously a single multiplication is beyond "conservative" and well into "not actually recognizing anything at all".