Swift Embedded Concurrency

iCMDdev · September 11, 2024, 12:30pm

Hello!
I read in the Swift Embedded manual that Swift Concurrency is under development.

This got me wondering... How is this going to be implemented? For example, will we provide some sort of callbacks representative of each thread to the concurrency API?

rauhul · September 12, 2024, 3:58am

We are currently working through "How is this going to be implemented", so the answer for now is to stay posted.

One active area of work is stabilizing the interface between libSwiftConcurrency and the "Executor" which actually runs the Swift Concurrency Tasks. On desktop platforms, this is handled by libDispatch, but on embedded systems it will need a different library specific to the platform.

its likely we will need to create adapters for popular runtimes e.g. Zephyr, FreeRTOS, etc...

rvsrvs · November 25, 2024, 1:12pm

Just curious, anything happening here in nightly builds or branches that I could play with?

ktoso · November 25, 2024, 1:29pm

Not yet, no. We’ll have to propose some APIs through swift evolution and once we have a direction in mind this will be shared through the usual evolution pitches and process. There’s nothing early to check out yet, you can probably keep an eyes on this thread for any movement. It’s just a bit too early to share the ideation phases.

hassila · November 25, 2024, 5:30pm

Does this relate to a potential different backend also for normal Linux as mention here by @ktoso / @John_McCall ?

github.com/swiftlang/swift-corelibs-libdispatch

Swift Actor/Tasks concurrency on Linux - Lock contention in rand()

opened 10:20AM - 23 May 22 UTC

closed 11:04PM - 08 Jan 24 UTC

freef4ll

Linux

We have a workload pipeline which is chaining several thousand Actors to each ot…her via AsyncStream processing pipeline. There is a multiplication affect that a single event at the start of the processing pipeline will be amplified as the event will be delivered to several Tasks processing the events concurrently. The processing time of each wakeup is currently quite small and on several microseconds range currently. Under Linux, what was observed when stressing this processing pipeline is that ~45% of the stacks show `__DISPATCH_ROOT_QUEUE_CONTENDED_WAIT__()`, which is leading to lock contention in [glibc rand()](https://elixir.bootlin.com/glibc/glibc-2.35/source/stdlib/random.c#L291) - as there are ~60 threads which are created and they all contend here: ``` 7f193794a2db futex_wait+0x2b (inlined) 7f193794a2db __GI___lll_lock_wait_private+0x2b (inlined) 7f19378ff29b __random+0x6b (/usr/lib/x86_64-linux-gnu/libc.so.6) 7f19378ff76c rand+0xc (/usr/lib/x86_64-linux-gnu/libc.so.6) 7f1937bac612 __DISPATCH_ROOT_QUEUE_CONTENDED_WAIT__+0x12 (/usr/lib/swift/linux/libdispatch.so) ``` This is occurring in every entrance of [__DISPATCH_ROOT_QUEUE_CONTENDED_WAIT__()](https://github.com/apple/swift-corelibs-libdispatch/blob/main/src/queue.c#L5847-L5858), while using macro [_dispatch_contention_wait_until()](https://github.com/compnerd/swift-corelibs-libdispatch/blob/master/src/shims/yield.h#L118-120) which in turn uses [_dispatch_contention_spins()](https://github.com/compnerd/swift-corelibs-libdispatch/blob/master/src/shims/yield.h#L115), in here the rand() call comes in and the macro produces just these 4 values: 31, 63, 95 and 127 for how many pause/yield instructions to execute. The following example can reproduce the issue where ~28% of the time when sampling is spent in the code path mentioned. The example creates 5000 tasks which work between 1μs and 3μs and then sleep for random 6-10 milliseconds. The point of the test is to create the contention and to illustrate the issue with rand(): ```swift // $ swift package init --type executable --name RandomTasks // $ cat Sources/RandomTasks/main.swift && swift run -c release import Foundation let numberOfTasks = 5000 let randomSleepRangeMs: ClosedRange<UInt64> = 6 ... 10 // correlates closely to processing amount in micros let randomWorkRange: ClosedRange<UInt32> = 1 ... 3 @available(macOS 10.15, *) func smallInfinitiveTask() async { let randomWork = UInt32.random(in: randomWorkRange) let randomSleepNs = UInt64.random(in: randomSleepRangeMs) * 1_000_000 print("Task start; sleep: \(randomSleepNs) ns, randomWork: \(randomWork) ") while true { do { var x2: String = "" x2.reserveCapacity(2000) for _ in 1 ... 50 * randomWork { x2 += "hi" } // Thread.sleep(forTimeInterval: 0.001) // 1ms try await Task.sleep(nanoseconds: randomSleepNs) } catch {} } } @available(macOS 10.15, *) func startLotsOfTasks(_ tasks: Int) { for _ in 1 ... tasks { Task { await smallInfinitiveTask() } } } if #available(macOS 10.15, *) { startLotsOfTasks(numberOfTasks) } else { // Fallback on earlier versions print("Unsupported") } sleep(600) ``` When run on Ryzen 5950X system, 18-19 HT cores are spent processing the workload. While on M1 Pro just ~4. ![rand-contention](https://user-images.githubusercontent.com/103502659/169798313-0f6c4694-aa32-43d5-973c-19441178ff3c.svg)

github.com/swiftlang/swift-corelibs-libdispatch

Swift Actor/Tasks concurrency on Linux - Lock contention in rand()

opened 10:20AM - 23 May 22 UTC

closed 11:04PM - 08 Jan 24 UTC

freef4ll

Linux

We have a workload pipeline which is chaining several thousand Actors to each ot…her via AsyncStream processing pipeline. There is a multiplication affect that a single event at the start of the processing pipeline will be amplified as the event will be delivered to several Tasks processing the events concurrently. The processing time of each wakeup is currently quite small and on several microseconds range currently. Under Linux, what was observed when stressing this processing pipeline is that ~45% of the stacks show `__DISPATCH_ROOT_QUEUE_CONTENDED_WAIT__()`, which is leading to lock contention in [glibc rand()](https://elixir.bootlin.com/glibc/glibc-2.35/source/stdlib/random.c#L291) - as there are ~60 threads which are created and they all contend here: ``` 7f193794a2db futex_wait+0x2b (inlined) 7f193794a2db __GI___lll_lock_wait_private+0x2b (inlined) 7f19378ff29b __random+0x6b (/usr/lib/x86_64-linux-gnu/libc.so.6) 7f19378ff76c rand+0xc (/usr/lib/x86_64-linux-gnu/libc.so.6) 7f1937bac612 __DISPATCH_ROOT_QUEUE_CONTENDED_WAIT__+0x12 (/usr/lib/swift/linux/libdispatch.so) ``` This is occurring in every entrance of [__DISPATCH_ROOT_QUEUE_CONTENDED_WAIT__()](https://github.com/apple/swift-corelibs-libdispatch/blob/main/src/queue.c#L5847-L5858), while using macro [_dispatch_contention_wait_until()](https://github.com/compnerd/swift-corelibs-libdispatch/blob/master/src/shims/yield.h#L118-120) which in turn uses [_dispatch_contention_spins()](https://github.com/compnerd/swift-corelibs-libdispatch/blob/master/src/shims/yield.h#L115), in here the rand() call comes in and the macro produces just these 4 values: 31, 63, 95 and 127 for how many pause/yield instructions to execute. The following example can reproduce the issue where ~28% of the time when sampling is spent in the code path mentioned. The example creates 5000 tasks which work between 1μs and 3μs and then sleep for random 6-10 milliseconds. The point of the test is to create the contention and to illustrate the issue with rand(): ```swift // $ swift package init --type executable --name RandomTasks // $ cat Sources/RandomTasks/main.swift && swift run -c release import Foundation let numberOfTasks = 5000 let randomSleepRangeMs: ClosedRange<UInt64> = 6 ... 10 // correlates closely to processing amount in micros let randomWorkRange: ClosedRange<UInt32> = 1 ... 3 @available(macOS 10.15, *) func smallInfinitiveTask() async { let randomWork = UInt32.random(in: randomWorkRange) let randomSleepNs = UInt64.random(in: randomSleepRangeMs) * 1_000_000 print("Task start; sleep: \(randomSleepNs) ns, randomWork: \(randomWork) ") while true { do { var x2: String = "" x2.reserveCapacity(2000) for _ in 1 ... 50 * randomWork { x2 += "hi" } // Thread.sleep(forTimeInterval: 0.001) // 1ms try await Task.sleep(nanoseconds: randomSleepNs) } catch {} } } @available(macOS 10.15, *) func startLotsOfTasks(_ tasks: Int) { for _ in 1 ... tasks { Task { await smallInfinitiveTask() } } } if #available(macOS 10.15, *) { startLotsOfTasks(numberOfTasks) } else { // Fallback on earlier versions print("Unsupported") } sleep(600) ``` When run on Ryzen 5950X system, 18-19 HT cores are spent processing the workload. While on M1 Pro just ~4. ![rand-contention](https://user-images.githubusercontent.com/103502659/169798313-0f6c4694-aa32-43d5-973c-19441178ff3c.svg)

John_McCall · November 25, 2024, 6:02pm

If you have an actually concurrent embedded environment, it would probably be easier to port a new thread pool to your environment’s thread system than to port all of libdispatch. A lot of embedded environments are not concurrent in the threads sense, though — they might take asynchronous signals/interrupts, which is a form of concurrency, but there’s no actual parallel execution.

hassila · November 25, 2024, 6:17pm

Right, let me clarify: I was more curious if this work dovetailed with the possible replacement of libdispatch on Linux with a custom thread pool - we see a number of performance issues we’d like to try to help sort out on Linux that are related to libdispatch but as it’s not the long term solution we’re curious whether the work for embedded would define the required apis for us to roll our own runtime then if needed more easily.

John_McCall · November 25, 2024, 6:50pm

Ah. I don’t think the possible Dispatch replacement has much to do with it, but yes, if you want to completely swap out the thread pool, that is something embedded developers also want to do and should become easier as part of this work.

rvsrvs · November 25, 2024, 7:20pm

I'm completely resigned to rolling my own support over something like Zephyr, and am actually quite interested in seeing what that looks like. Just not sure which of the concurrency abstractions I should be paying attention to beyond Executor (which seems a bit light as an abstraction for this).

lukeh · November 25, 2024, 9:19pm

Chiming in to follow this as, I want to port IORingSwift to Zephyr. And maybe Swift to XMOS, although that might be a bit too ambitious a distraction.