`withTaskGroup(of:)` leaks huge amounts of memory, but only on linux

i’ve been spending a couple of days debugging a memory leak in my application, which leaks over 300MB of memory per second. i was able to isolate the issue to a call to withTaskGroup(of:), which looks pretty much like this:

    private 
    func peg<Client>( ... , using client:Client, as:Client.Type,
        whenever instructions:
        @Sendable () async -> (price:Decimal<Int64>, indicators:T)?) 
        async -> [Fill<T>]
        where Client:Executor
    {
        let pollster:Task = .init
        {
            while self.closed < size 
            {
                await self.poll(using: client)
            }
        }
        while self.closed < size
        {
            ...
        }
        ...
        let _:Void          = await pollster.value
        ...
    }
    private 
    func poll<Client>(using client:Client) async 
        where Client:Executor
    {
        await withTaskGroup(of: Order.self) 
        { 
            (group:inout TaskGroup<Order>) in
            for identifier:Int64 in self.orders.keys
            {
                group.addTask 
                {
                    await client.status(order: identifier)
                }
            } 
            for await order:Order in group 
            {
                ...
            } 
        }  
        await Task.yield()
    }

originally, i thought the issue was in the tasks being added to the task group, but i found that the memory leaks (at the same rate!) even when no tasks are added at all, and even when the return type is set to Void:

    private 
    func poll<Client>(using client:Client) async 
        where Client:Executor
    {
        // still leaks!!!
        await withTaskGroup(of: Void.self) 
        { 
            (group:inout TaskGroup<Void>) in
        }
        await Task.yield()
    }

the only thing that stops the memory leak is removing the call to withTaskGroup(of:) entirely.

i have also tried profiling the application on MacOS with Instruments, but the memory leak does not seem to occur on MacOS, only on Linux builds.

anyone have any ideas what’s going on here?

2 Likes

i’ve got the issue reduced to the following repro:

// taskgroup.swift 

@main 
enum Main 
{
    static 
    func main() async
    {
        while true 
        {
            await withTaskGroup(of: Void.self) 
            { 
                _ in
            } 
        }
    }
}
$ swiftenv local DEVELOPMENT-SNAPSHOT-2021-09-07-a
$ swiftc -parse-as-library -O taskgroup.swift 
$ ./taskgroup

on my machine, this test program leaks over 3.5 GB of memory per second…

1 Like

Thanks for the report we had a bug about child tasks leaking that we fixed here: https://github.com/apple/swift/pull/39158

The fact that there’s a leak with no tasks added at all seems super weird… thanks for reporting, well have to have a look.

1 Like

update: i just tested with yesterday’s nightly (8e4054f), which includes PR #39158, and the issue still appears to be present.

1 Like

Thanks for confirming, I’ll have a look on Monday after vacation. Thanks again for spotting this.

3 Likes

Hi there, I confirmed the issue on linux and am looking into it.

3 Likes

Terribly silly bug... thanks for reporting, we were able to track it down and fix it: [Concurrency] Don't leak groups - call the destructor explicitly by ktoso · Pull Request #39295 · apple/swift · GitHub

This will be fixed in upcoming releases.

Thanks again for the report!

5 Likes

perfect! this was a showstopper for one of my projects… thank you so much!

any idea where I can get my hands on a toolchain containing the fix? the nightlies on swift.org haven’t been updated since last thursday.

1 Like

Sadly I don’t know when there will be a nightly toolchain with this, hopefully soon.

In the meantime there is a toolchain for linux in this PR [Concurrency] Don't leak groups - call the destructor explicitly by ktoso · Pull Request #39295 · apple/swift · GitHub

1 Like

that looks like a 16.04 toolchain, is there a toolchain for 20.04?

Sorry, no. You’ll have to wait for the nightlies.

i’m trying to build the toolchain locally in the meantime, but i’m running into SR-14710 :/ (symlinking .swiftenv to the toolchain didn’t work). the error output suggests the flag --skip-early-swift-driver, but this doesn’t seem to be a valid option for utils/build-toolchain. do you know of a workaround?

it is also complaining about .distcc/zeroconf/hosts contained no hosts; can't distribute work, but i am not even using the --distcc flag…