iCMDdev
(CMD)
September 11, 2024, 12:30pm
1
Hello!
I read in the Swift Embedded manual that Swift Concurrency is under development.
This got me wondering... How is this going to be implemented? For example, will we provide some sort of callbacks representative of each thread to the concurrency API?
1 Like
rauhul
(Rauhul Varma)
September 12, 2024, 3:58am
2
We are currently working through "How is this going to be implemented", so the answer for now is to stay posted.
One active area of work is stabilizing the interface between libSwiftConcurrency and the "Executor" which actually runs the Swift Concurrency Tasks. On desktop platforms, this is handled by libDispatch, but on embedded systems it will need a different library specific to the platform.
its likely we will need to create adapters for popular runtimes e.g. Zephyr, FreeRTOS, etc...
4 Likes
rvsrvs
(rvsrvs)
November 25, 2024, 1:12pm
3
Just curious, anything happening here in nightly builds or branches that I could play with?
ktoso
(Konrad 'ktoso' Malawski šš“āā ļø)
November 25, 2024, 1:29pm
4
Not yet, no. Weāll have to propose some APIs through swift evolution and once we have a direction in mind this will be shared through the usual evolution pitches and process. Thereās nothing early to check out yet, you can probably keep an eyes on this thread for any movement. Itās just a bit too early to share the ideation phases.
2 Likes
hassila
(Joakim Hassila)
November 25, 2024, 5:30pm
5
Does this relate to a potential different backend also for normal Linux as mention here by @ktoso / @John_McCall ?
opened 10:20AM - 23 May 22 UTC
closed 11:04PM - 08 Jan 24 UTC
Linux
We have a workload pipeline which is chaining several thousand Actors to each otā¦ her via AsyncStream processing pipeline.
There is a multiplication affect that a single event at the start of the processing pipeline will be amplified as the event will be delivered to several Tasks processing the events concurrently. The processing time of each wakeup is currently quite small and on several microseconds range currently.
Under Linux, what was observed when stressing this processing pipeline is that ~45% of the stacks show `__DISPATCH_ROOT_QUEUE_CONTENDED_WAIT__()`, which is leading to lock contention in [glibc rand()](https://elixir.bootlin.com/glibc/glibc-2.35/source/stdlib/random.c#L291) - as there are ~60 threads which are created and they all contend here:
```
7f193794a2db futex_wait+0x2b (inlined)
7f193794a2db __GI___lll_lock_wait_private+0x2b (inlined)
7f19378ff29b __random+0x6b (/usr/lib/x86_64-linux-gnu/libc.so.6)
7f19378ff76c rand+0xc (/usr/lib/x86_64-linux-gnu/libc.so.6)
7f1937bac612 __DISPATCH_ROOT_QUEUE_CONTENDED_WAIT__+0x12 (/usr/lib/swift/linux/libdispatch.so)
```
This is occurring in every entrance of [__DISPATCH_ROOT_QUEUE_CONTENDED_WAIT__()](https://github.com/apple/swift-corelibs-libdispatch/blob/main/src/queue.c#L5847-L5858), while using macro [_dispatch_contention_wait_until()](https://github.com/compnerd/swift-corelibs-libdispatch/blob/master/src/shims/yield.h#L118-120) which in turn uses [_dispatch_contention_spins()](https://github.com/compnerd/swift-corelibs-libdispatch/blob/master/src/shims/yield.h#L115), in here the rand() call comes in and the macro produces just these 4 values: 31, 63, 95 and 127 for how many pause/yield instructions to execute.
The following example can reproduce the issue where ~28% of the time when sampling is spent in the code path mentioned.
The example creates 5000 tasks which work between 1Ī¼s and 3Ī¼s and then sleep for random 6-10 milliseconds. The point of the test is to create the contention and to illustrate the issue with rand():
```swift
// $ swift package init --type executable --name RandomTasks
// $ cat Sources/RandomTasks/main.swift && swift run -c release
import Foundation
let numberOfTasks = 5000
let randomSleepRangeMs: ClosedRange<UInt64> = 6 ... 10
// correlates closely to processing amount in micros
let randomWorkRange: ClosedRange<UInt32> = 1 ... 3
@available(macOS 10.15, *)
func smallInfinitiveTask() async {
let randomWork = UInt32.random(in: randomWorkRange)
let randomSleepNs = UInt64.random(in: randomSleepRangeMs) * 1_000_000
print("Task start; sleep: \(randomSleepNs) ns, randomWork: \(randomWork) ")
while true {
do {
var x2: String = ""
x2.reserveCapacity(2000)
for _ in 1 ... 50 * randomWork {
x2 += "hi"
}
// Thread.sleep(forTimeInterval: 0.001) // 1ms
try await Task.sleep(nanoseconds: randomSleepNs)
} catch {}
}
}
@available(macOS 10.15, *)
func startLotsOfTasks(_ tasks: Int) {
for _ in 1 ... tasks {
Task {
await smallInfinitiveTask()
}
}
}
if #available(macOS 10.15, *) {
startLotsOfTasks(numberOfTasks)
} else {
// Fallback on earlier versions
print("Unsupported")
}
sleep(600)
```
When run on Ryzen 5950X system, 18-19 HT cores are spent processing the workload. While on M1 Pro just ~4.

opened 10:20AM - 23 May 22 UTC
closed 11:04PM - 08 Jan 24 UTC
Linux
We have a workload pipeline which is chaining several thousand Actors to each otā¦ her via AsyncStream processing pipeline.
There is a multiplication affect that a single event at the start of the processing pipeline will be amplified as the event will be delivered to several Tasks processing the events concurrently. The processing time of each wakeup is currently quite small and on several microseconds range currently.
Under Linux, what was observed when stressing this processing pipeline is that ~45% of the stacks show `__DISPATCH_ROOT_QUEUE_CONTENDED_WAIT__()`, which is leading to lock contention in [glibc rand()](https://elixir.bootlin.com/glibc/glibc-2.35/source/stdlib/random.c#L291) - as there are ~60 threads which are created and they all contend here:
```
7f193794a2db futex_wait+0x2b (inlined)
7f193794a2db __GI___lll_lock_wait_private+0x2b (inlined)
7f19378ff29b __random+0x6b (/usr/lib/x86_64-linux-gnu/libc.so.6)
7f19378ff76c rand+0xc (/usr/lib/x86_64-linux-gnu/libc.so.6)
7f1937bac612 __DISPATCH_ROOT_QUEUE_CONTENDED_WAIT__+0x12 (/usr/lib/swift/linux/libdispatch.so)
```
This is occurring in every entrance of [__DISPATCH_ROOT_QUEUE_CONTENDED_WAIT__()](https://github.com/apple/swift-corelibs-libdispatch/blob/main/src/queue.c#L5847-L5858), while using macro [_dispatch_contention_wait_until()](https://github.com/compnerd/swift-corelibs-libdispatch/blob/master/src/shims/yield.h#L118-120) which in turn uses [_dispatch_contention_spins()](https://github.com/compnerd/swift-corelibs-libdispatch/blob/master/src/shims/yield.h#L115), in here the rand() call comes in and the macro produces just these 4 values: 31, 63, 95 and 127 for how many pause/yield instructions to execute.
The following example can reproduce the issue where ~28% of the time when sampling is spent in the code path mentioned.
The example creates 5000 tasks which work between 1Ī¼s and 3Ī¼s and then sleep for random 6-10 milliseconds. The point of the test is to create the contention and to illustrate the issue with rand():
```swift
// $ swift package init --type executable --name RandomTasks
// $ cat Sources/RandomTasks/main.swift && swift run -c release
import Foundation
let numberOfTasks = 5000
let randomSleepRangeMs: ClosedRange<UInt64> = 6 ... 10
// correlates closely to processing amount in micros
let randomWorkRange: ClosedRange<UInt32> = 1 ... 3
@available(macOS 10.15, *)
func smallInfinitiveTask() async {
let randomWork = UInt32.random(in: randomWorkRange)
let randomSleepNs = UInt64.random(in: randomSleepRangeMs) * 1_000_000
print("Task start; sleep: \(randomSleepNs) ns, randomWork: \(randomWork) ")
while true {
do {
var x2: String = ""
x2.reserveCapacity(2000)
for _ in 1 ... 50 * randomWork {
x2 += "hi"
}
// Thread.sleep(forTimeInterval: 0.001) // 1ms
try await Task.sleep(nanoseconds: randomSleepNs)
} catch {}
}
}
@available(macOS 10.15, *)
func startLotsOfTasks(_ tasks: Int) {
for _ in 1 ... tasks {
Task {
await smallInfinitiveTask()
}
}
}
if #available(macOS 10.15, *) {
startLotsOfTasks(numberOfTasks)
} else {
// Fallback on earlier versions
print("Unsupported")
}
sleep(600)
```
When run on Ryzen 5950X system, 18-19 HT cores are spent processing the workload. While on M1 Pro just ~4.

If you have an actually concurrent embedded environment, it would probably be easier to port a new thread pool to your environmentās thread system than to port all of libdispatch. A lot of embedded environments are not concurrent in the threads sense, though ā they might take asynchronous signals/interrupts, which is a form of concurrency, but thereās no actual parallel execution.
hassila
(Joakim Hassila)
November 25, 2024, 6:17pm
7
Right, let me clarify: I was more curious if this work dovetailed with the possible replacement of libdispatch on Linux with a custom thread pool - we see a number of performance issues weād like to try to help sort out on Linux that are related to libdispatch but as itās not the long term solution weāre curious whether the work for embedded would define the required apis for us to roll our own runtime then if needed more easily.
Ah. I donāt think the possible Dispatch replacement has much to do with it, but yes, if you want to completely swap out the thread pool, that is something embedded developers also want to do and should become easier as part of this work.
2 Likes
rvsrvs
(rvsrvs)
November 25, 2024, 7:20pm
9
I'm completely resigned to rolling my own support over something like Zephyr, and am actually quite interested in seeing what that looks like. Just not sure which of the concurrency abstractions I should be paying attention to beyond Executor (which seems a bit light as an abstraction for this).
lukeh
(Luke Howard)
November 25, 2024, 9:19pm
10
Chiming in to follow this as, I want to port IORingSwift to Zephyr. And maybe Swift to XMOS, although that might be a bit too ambitious a distraction.