Swift 5.5 has serious stack corruption bugs!

hi all, I’ve discovered several stack corruption bugs related to async/await which can be reproduced in simple test programs compiled with recent nightly toolchains. i have confirmed that two three four of these bugs are present in the 5.5-RELEASE toolchain.


  1. stack corruption when using values returned by async closures

    the return values of async closures are clobbered at a chaotic, but deterministic memory offset.

    present in 5.5-RELEASE? present in debug builds? present in release builds?
    yes no yes

    reproduction:

// async-return-value-corruption.swift

@main 
enum Main 
{
    actor A
    {
        init() 
        {
        }
        
        func a(_ f:@Sendable () async -> (Int, (Int, Int, Int, Int))?) 
            async -> Void
        {
            guard let (head, tail):(Int, (Int, Int, Int, Int)) = await f()
            else 
            {
                return 
            }
            
            print((head, tail))
            return 
        }
    }
    
    static 
    func p() async -> Bool
    {
        true 
    }
    
    static 
    func main() async
    {
        while true 
        {
            let a:A     = .init()
            
            async let task:Void = a.a
            {
                if await Self.p()
                {
                    return (0, (0, 0, 0 ,0)) 
                }
                else 
                {
                    return nil 
                }
            }
            await task
        }
    }
}
$ swiftc --version
Swift version 5.5 (swift-5.5-RELEASE)
Target: x86_64-unknown-linux-gnu

$ swiftc -parse-as-library -O async-return-value-corruption.swift
$ ./async-return-value-corruption 

(139787763716080, (0, 0, 0, 0)) 
(139787629498352, (0, 0, 0, 0)) 
(139787965042672, (0, 0, 0, 0)) 
(139787763716080, (0, 0, 0, 0)) 
(139787629498352, (0, 0, 0, 0))
...

  1. segmentation faults when using async let

    relatively simple usages of async let suffer from segmentation faults, on both debug, and release builds. while i originally believed this was limited to usage of async let in the main function, i have since also observed this issue in a variety of other contexts.

    present in 5.5-RELEASE? present in debug builds? present in release builds?
    yes yes yes

    reproduction:

// async-let-segfault.swift 
@main 
enum Main 
{
    static 
    func foo() async -> [Void]
    {
        try? await Task.sleep(nanoseconds: 1)
        return []
    }
    static 
    func main() async
    {
        async let task:Void = 
        {
            () async -> () in 
            try? await Task.sleep(nanoseconds: 1)
        }()
        while true 
        {
            let _:[Void] = await Self.foo()
        }
    }
}
$ swiftc --version
Swift version 5.5 (swift-5.5-RELEASE)
Target: x86_64-unknown-linux-gnu

$ swiftc -O -parse-as-library async-let-segfault.swift 
async-let-segfault.swift:23:15: warning: will never be executed
        await task
              ^
async-let-segfault.swift:19:15: note: condition always evaluates to true
        while true 
              ^
$ ./async-let-segfault 
Segmentation fault (core dumped)


  1. stack corruption when passing enums to actor-isolated methods

    the enum values received by an actor-isolated method are not the same values that were passed by the caller. i have observed this issue in both debug and release builds, but is vastly more common and reproducible in debug builds. thankfully, it does not affect the 5.5-RELEASE toolchain, but all of the recent nightlies, including DEVELOPMENT-SNAPSHOT-2021-09-23-a, are affected.

    (update) after further testing, i’ve found that the 5.5-RELEASE toolchain is indeed affected by this bug. it occurs in both debug and release builds. see this follow-up for details.

    present in 5.5-RELEASE? present in debug builds? present in release builds?
    no yes yes sometimes yes

    reproduction:

// async-stack-corruption.swift 

struct Users
{
    enum Access 
    {
        case guest
        case admin(Int)
        case developer(Int, Int, Int, Int)
    }
    actor State  
    {
        init()
        {
        }
        func set(permissions:(user:Int, access:Access?)) 
        {
            print(permissions)
        }
    }
    
    let state:State = .init()
    
    func set(permissions:(user:Int, access:Access?)) async 
    {
        await self.state.set(permissions: permissions)
    }
}
@main 
enum Main 
{
    static 
    func main() async
    {
        let users:Users             = .init()
        let stream:AsyncStream<Int> = .init 
        {
            for i in 0 ..< 10
            {
                $0.yield(i) 
            }
            $0.finish()
        }
        for await i:Int in stream 
        {
            await users.set(permissions: (i, .guest))
        }
    }
}
$ swiftc --version
Swift version 5.6-dev (LLVM ae102eaadf2d38c, Swift be2d00b32742678)
Target: x86_64-unknown-linux-gnu
$ swiftc -parse-as-library async-stack-corruption.swift 
$ ./async-stack-corruption 
(user: 0, access: Optional(main.Users.Access.admin(0)))
(user: 1, access: Optional(main.Users.Access.admin(0)))
(user: 2, access: Optional(main.Users.Access.admin(0)))
(user: 3, access: Optional(main.Users.Access.admin(0)))
(user: 4, access: Optional(main.Users.Access.admin(0)))
(user: 5, access: Optional(main.Users.Access.admin(0)))
(user: 6, access: Optional(main.Users.Access.admin(0)))
(user: 7, access: Optional(main.Users.Access.admin(0)))
(user: 8, access: Optional(main.Users.Access.admin(0)))
(user: 9, access: Optional(main.Users.Access.admin(0)))

i have encountered additional memory corruption bugs, including some weirdness with calling instance methods on actor-isolated properties (thread), but these 3 are the issues i had time to isolate and reproduce this week. i have filed them as:

  1. SR-15225
  2. SR-15241
  3. SR-15240

in my view, bug (1) is by far the most dangerous, as it occurs silently, and affects the 5.5 release toolchain. like bug (3), bug (1) also represents a potential security vulnerability, though it is probably not easily exploitable in naturally occurring code.

(update) now that i’ve found that bug (3) does occur in binaries built with the 5.5 release toolchain, i am now even more concerned about bug (3) than i was about bug (1). because both of them turn into segfaults depending on minor code changes, it is possible that bug (2), which i have so far only been able to reproduce as a segfault, could also manifest as a silent vulnerability.

i would advise that people using concurrency features should not ship anything compiled with the 5.5 toolchain, until these issues are fixed and a patch is released.


(update) it looks like there is yet another issue with task groups and SIMD types with wide alignments. see this followup. i have confirmed this issue affects the 5.5-RELEASE toolchain as well. i was able to reproduce it on both debug and release builds.

  1. present in 5.5-RELEASE? present in debug builds? present in release builds?
    yes yes yes

    reproduction:

@main
enum Main
{
    static
    func main() async
    {
        await withTaskGroup(of: SIMD4<Int32>.self) 
        { 
            (group:inout TaskGroup) in
            group.addTask 
            {
                return SIMD4<Int32>.init(repeating: 0)
            }
        }
    }
}
$ swiftc --version
Swift version 5.5 (swift-5.5-RELEASE)
Target: x86_64-unknown-linux-gnu
$ swiftc -O -parse-as-library async-stack-corruption-simd.swift 
$ ./async-stack-corruption-simd 
Segmentation fault (core dumped)
36 Likes

Thanks for investigating. These look even more scary than the crashes due to memory management I have seen.

3 Likes

after investigating further, it looks like bug #3, or at least a variant of it does in fact affect the 5.5-RELEASE toolchain.

a modified example program to reproduce this problem is given below:

struct Users
{
    enum Access 
    {
        case guest
        case admin(Int)
        case developer(Int, Int, Int, Int)
    }
    
    private 
    actor User 
    {
        init()
        {
        }
        
        func set(permissions:(Int, Access?))
        {
            print(permissions)
        }
    }
    
    private 
    let users:[Int: User] = [0: .init()]
    
    func set(permissions:(Int, Access?)) async 
    {
        print(permissions)
        guard let user:User = self.users[permissions.0]
        else 
        {
            print(" \(permissions.0) ")
            return  
        }
        await user.set(permissions: permissions)
    }
}

@main 
enum Main 
{
    static 
    func main() async
    {
        let coordinator:Users = .init()
        let stream:AsyncStream<Int> = .init 
        {
            for i in 0 ..< 10
            {
                $0.yield(i) 
            }
            $0.finish()
        }

        for await i:Int in stream
        {
            if i != 0 
            {
                continue 
            }
            await coordinator.set(permissions: (i, .guest))
        }
    }
}
$ swiftc --version
Swift version 5.5 (swift-5.5-RELEASE)
Target: x86_64-unknown-linux-gnu
$ swiftc -O -parse-as-library async-stack-corruption-5.5.swift
$ ./async-stack-corruption-5.5 
(0, Optional(main.Users.Access.admin(144)))
(0, Optional(main.Users.Access.admin(144)))

moreover, while investigating this bug, i also noticed that i could influence the specific enum cases that the real values are overwritten with based on user input. this means that (alongside SR-15225) SR-15240 represents a major security vulnerability in Swift 5.5. it is also highly chaotic — removing or reordering any of the print statements in this program completely changes the behavior — so it is extremely likely that a minor change in your codebase before a release could silently introduce a vulnerability.

again, i would urge anyone using the 5.5-RELEASE toolchain to either avoid concurrency features, or assume that all binaries compiled with it are compromised.

9 Likes

Hi, am experiencing a similar problem and do not know how to fix it. Have you found a workaround?

2 Likes

I don't know if this is related but I get a crash with simd types and task groups on macOS with XCode 13 release (13A233) and beta 5 (13A5212g) on Intel (i9) but it's fine on Arm. It smells like an alignment issue to me (it works with simd_int2, for example). I filed a bug in Apple's Feedback app (FB9531735).

The crash is at: thunk for @escaping @callee_guaranteed @Sendable @async () -> (@unowned SIMD4<Int32>):

Here's some code:

import simd

@main
enum CrashApp
{
    static
    func main() async
    {
        await withTaskGroup(of: simd_int4.self) { group in
            for i in 0..<10 {
                group.addTask {
                    return simd_int4(repeating: Int32(i))
                }
            }
         
            for await test in group {
                print( test.x )
            }
        }
    }
}
3 Likes

Indeed, lots of strange things happening w/ 5.5, in particular with concurrency though.

This code here:

/// A stream-based serial command queue
public actor StreamCommandQueue: NSObject {

    let input: InputStream
    let output: OutputStream
    private var pendingCommands: [StreamCommand] = [] //NOTE: Unused, but dare to remove it and we get a stack corruption!
    private var activeCommand: StreamCommand? {
        didSet {
            logger.debug("active command now \(self.activeCommand)")
        }
    }

crashes as soon as I remove the unused property pendingCommands array with the following backtrace:

(lldb) bt
* thread #2, queue = 'com.apple.root.user-initiated-qos.cooperative', stop reason = EXC_BAD_ACCESS (code=1, address=0xea5b7c95)
  * frame #0: 0x00000000ea5b7c95
    frame #1: 0x0000000105736aee libswift_Concurrency.dylib`swift::runJobInEstablishedExecutorContext(swift::Job*) + 158
    frame #2: 0x00000001057371ef libswift_Concurrency.dylib`swift_job_runImpl(swift::Job*, swift::ExecutorRef) + 63
    frame #3: 0x000000010851a3a0 libdispatch.dylib`_dispatch_continuation_pop + 93
    frame #4: 0x0000000108519964 libdispatch.dylib`_dispatch_async_redirect_invoke + 778
    frame #5: 0x0000000108529b44 libdispatch.dylib`_dispatch_root_queue_drain + 403
    frame #6: 0x000000010852a5ec libdispatch.dylib`_dispatch_worker_thread2 + 196
    frame #7: 0x00007fff6bfeb417 libsystem_pthread.dylib`_pthread_wqthread + 244
    frame #8: 0x00007fff6bfea42f libsystem_pthread.dylib`start_wqthread + 15
3 Likes

i am able to reproduce on linux, using the builtin SIMD4<T> type. the issue also affects the 5.5-RELEASE toolchain. moreover, it occurs even with a single child task, returning a single constant SIMD value.

@main
enum Main
{
    static
    func main() async
    {
        await withTaskGroup(of: SIMD4<Int32>.self) 
        { 
            (group:inout TaskGroup) in
            group.addTask 
            {
                return SIMD4<Int32>.init(repeating: 0)
            }
        }
    }
}
$ swiftc --version
Swift version 5.5 (swift-5.5-RELEASE)
Target: x86_64-unknown-linux-gnu
$ swiftc -O -parse-as-library async-stack-corruption-simd.swift 
$ ./async-stack-corruption-simd 
Segmentation fault (core dumped)

have you filed a bug on bugs.swift.org? it might help to link it to the other issues here

3 Likes

not really. i tried wrapping some of these return values in “padding” structs in hopes that the memory offset being corrupted would correspond to padding, but it just made the memory corruption more irregular and harder to identify.

3 Likes

I have been spending countless hours investigating so many problems that were in truth bugs from the compiler. Why did these problems not get found in testing? Why did Apple published it as a Swift RELEASE? How is Swift a "safe language" if we cannot trust some thing as sensible as a return value?

10 Likes

You’re right, these are very serious problems. I don’t remember issues of comparable severity in any Swift release, and the fact they weren’t discovered during testing is very troubling. It probably warrants some sort of post-mortem analysis, including a plan to improve on the project’s apparent deficiencies in testing, but that can only happen once the issues are identified and fixed.

I don’t know if that discussion will happen and the results shared with the community, but I hope they will be.

13 Likes

i have been in contact with one of the compiler devs, and they seem to be aware of the issue, and i’ve gotten assurances that this is being worked on. however, even without knowing the exact technical cause of the problem, we need to recognize that bugs happen, and the more important structural issue is why we did not have systems in place to prevent bugs like this from making it into official release binaries.

i think we do need to be having discussions about just what level of stability a Swift release should imply, and whether we should be providing some sort of “long-term-support” releases with stronger stability guarantees than the regularly-cycled releases. async/await in particular tends to be used in real-time safety-critical applications where bugs like these could cause significant real-world harm, and the fact that we don’t even provide “safety-grade” toolchains for people who need these guarantees is a deficiency in the language.

13 Likes

update: i’ve kicked off discussions for “long-term releases” here:

8 Likes

I am sorry, but I do not believe that this would solve the real cause of the big problem, that which is Apple decided to release the new compiler without I think testing it at all. The types of problems I am finding are the type that make that me think whoever gave the OK for this compiler release would also give the same OK for your "long term support release"

4 Likes

I think that tying Swift runtime to the OS releases is a risk unless the language stops evolving with large jumps in features that require a runtime update. Kotlin and Jetpack Compose Evolution on Android benefit from being decoupled from Android OS releases.

11 Likes

The compiler does not ship with the OS, unlike the Swift runtime and stdlibs. This ship has sailed. All the hard work around ABI stability on Apple platforms had this unique goal in mind: ship the stdlib and the runtime in the OS, so that apps do not have to bundle it.

I actually don't know if bugs spotted by @taylorswift and friends live only in the compiler, or in the runtime, or in both.

If they only live in the compiler, then only a future toolchain, or a Xcode update, will fix those bugs.

If they live partially in the runtime that ships with the OS, then, regardless of compiler releases, some devices are irremediably subject to bugs, unless the application code contains workarounds for impacted system versions.

3 Likes

I am aware that the compiler does not ship with the apps and only the runtime and stdlibs do and hopefully as you said the issues has workarounds that do not require iOS 15.1 or 15.0.1 as minimum requirement but only code changes and compiler changes.

I understand third party apps not having to bundle stdlib and Swift runtime was one of the goals behind this decision and I understand the implications you explained afterwards (fingers crossed for the backdeployment to iOS 15.0 being possible thanks to @Douglas_Gregor work :)), but it is orthogonal to what I was saying and it a balancing act of pros and cons. This is one of the cons. For many third party apps being able to bundle Swift runtime and standard libraries as an option would help: app binary size is not the #1 concern / most pressing concern many devs have (dwarfed by content/data size in many case) and also not more important than app correctness/safety.

There are cons to Kotlin being JVM based on Android, but there are pros (mostly independent from Android Studio updates and Android OS updates), the same go for other components like Compose or Webview that are packed with the apps or updated by Google Play Services and backdeploy to Android 5+.

2 Likes

This is not the main one.
Without ABI stability, Apple would not be able to expose Swift API from system frameworks.
If you want to release a "stand-alone" app, you should not only embed runtime and stdlib, but also all Swift frameworks that your app is using.

Most android apps already do that (embedding multi megabytes of system libraries to workaround fragmentation issue), the drawback being that you can't fix bug system wide. Every single app has to be recompile and updated when there is a bug in one support library.

If Kotlin is independent of Android Studio and Android, this is because the JVM guarantee ABI stability, but it also means Kotlin can't use advanced code optimisation technics, as it is limited by what the JVM support.

In this regards, Swift async/await is far more advanced than Kotlin coroutines.

2 Likes

I appreciate it is an option with costs involved, but as an option it would allow an easier path out of critica bugs such as this one and would allow the language and frameworks to iterate faster as they would get real world use sooner. I see these benefits as being more important than binary size and performance difference that while not being trivial would you say it is that massive?

Not disputing it has pros, but it has its cons

Every serious bug that arises cannot devolve into a discussion about Apple’s decision regarding Swift-in-the-OS, which has never been the community’s call to make in the first place. Let’s keep this topic focused on the problem at hand.

25 Likes

Ok, will cut the derailment here sorry for that. I think that regardless of what is supposed to be the call of the community or not, you revise decisions based on new evidence if not it becomes a religious matter.