Enable MultiThreading

Is it possible to set/increase the total number of threads instead of using the default ie the number of cores the system has :thinking:, because the '-num-threads ' flag doesn't seem to work :pensive:.

The number of cores is a feature of the hardware. Aside from the "just download more RAM" meme, that isn't a thing you can just change through software.

What exactly were you trying to change?

Okay, I'm writing a tool that enumerate files in a system and running with only 8 threads ie my CPU core is very slow. I need to increase the thread count to maximize its efficiency.
Any help will be highly appreciated :pray:

I don't quite understand your question. Is this about async/await, or DispatchQueue or what?

There are several API's in system that allow you to control the number of "execution paths".
For example, OperationQueue has "maxConcurrentOperationCount". To be absolutely sure you have N real threads for your task at hand you don't use a higher level abstraction like async/await or Dispatch/Operation queues but use Thread API directly.

But you are talking about I/O task, which usually can't benefit from increased number of threads. In theory a single thread would be as fast as N threads if the task is I/O bound (provided OS API allows for that) - that task would spend most of it's time in suspended state anyway waiting for I/O to finish. Only CPU bound tasks can benefit from increasing numberer of threads (typically up to a number of cores). Do you do any heavy processing on the file contents?


You can only increase the number of running threads by creating more of them, but then you will run the risk of thread explosion if you create more threads than the number of cores available.

1 Like

That's a risk, but only if you have an unbounded number of threads. Simply making a fixed number of threads that's greater than your core count is totally fine.

Only if the IO is non-blocking, but yeah.

We need more input from OP about how many files he's reading, their size, and what kind of processing he's doing on each one.

Is there an optimal value for the fixed number of threads? Also, would you happen to know if there is some good-quality material to read about this?

I currently use async/await & I assume that with more threads the program will run faster & efficient. Can't the Swiftc compiler flag '-num-threads' do the work for me?

Your disks don’t multitask, and are almost certainly much slower than your cpus, so unless you are reading and analyzing the file data, the useful number of threads is probably equal the number of disks you can address simultaneously.

Do you know that multiple threads goes faster at all?

How have you measured it?

1 Like

Disks do in fact multitask these days, but figuring out the correct amount of IO parallelism is exceedingly difficult. There's a ton of tradeoffs, and optimal values depend heavily on the task, the hardware, the OS version, the priority, and what else is running.


I’ll admit I haven’t studied mass storage internals in a while. Know of any good articles covering the state of the art?

Not off the top of my head, sorry. I learned about it by talking with driver/storage controller folks while working on AsyncBytes.

My initial heuristic for typical iOS apps would be focus on minimizing memory footprint over squeezing every drop of IO bandwidth out of the SSD, i.e. start at N=1 and see how it goes.

On desktops it may be more complicated than that due to the possible presence of both larger files and unresponsive network volumes.

Servers of course are their own world where both IO and parallelism are critically important.

1 Like

The other thing to keep in mind is that the performance characteristics for file system metadata operations, like walking a directory hierarchy, are radically different from those for doing bulk I/O, like reading and writing large files.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple


I am late to know and reply to this post, but hope the reply may help others.

Disks tend to be multitasking now a days. You may allocate multiple threads to handle heavier tasks while making sure that the Thread Explosion does not happen. Assigning a heavy task to GCD's QOS would be advantageous rather than directing the compiler with flags as a first trial as per my view to test if that helps.

In my experience with about a dozen types of disks (mix of spinning rust, external SSD, and Apple-internal SSDs) a decent rule of thumb is 4 concurrent I/Os per spinning rust volume and 16 per removable SSD / 32 for Apple internal SSDs. That's just a starting point, of course. There are numerous nuances.

In general for large I/O (many megabytes at a time) N=1 is optimal, because you'll be bandwidth-limited. Though if your turn-around time between reads is non-trivial you might benefit from N=2. (if you want maximum performance you should generally have dedicated I/O threads that just shuttle data into your app's memory, with separate threads to actually operate on that data)

For very small I/O (single blocks at a time) N≤16 for spinning rust, ≤64 for SSD. You'll be seek / IOPS limited and just need to keep the native command queues full enough.

If you have RAID in the mix then of course it's more complicated. RAID0 somewhat lets you raise your concurrency, although not as much as you'd expect, at least when using Apple software RAID. Most other forms of RAID reduce the optimal concurrency level (and performance generally).

For filesystem metadata the picture is murkier, because it depends a lot on the file system in question and the exact workload. APFS generally performs worse than HFS+ in my experience, for example, but that could differ for others. Generally I find that the kernel is the bottleneck for metadata operations anyway, although I'm not sure why (slow / excessive locking of in-memory data structures?).

And the API you're using to perform the I/O matters, of course. Dispatch I/O is fairly fast, especially if you suitably tweak prefetching behaviour (particularly, disabling it if you're doing partial reads). Synchronous APIs (like plain C read) tend to be the lowest latency and therefore often achieve the highest throughput. Anything involving Structured Concurrency tends to be notably slower.