[Pitch] Cheap Task identity for high-performance instrumentation

hassila · May 12, 2026, 11:18am

Hi everyone,

I'd like to open a discussion about adding a lightweight, public API for reading the current Swift Task's identity — something analogous to pthread_self() for threads, but for Tasks.

Background

We've been building a high-performance, always-on span capture and structured logging library for Swift (not yet open source, but will be eventually). The design targets workloads where instrumentation is on the critical path — real-time systems, trading infrastructure, game engines, audio pipelines — and where the overhead of swift-distributed-tracing or swift-log would be measurable. A full span capture (timestamp, thread ID, CPU ID, SBE-encoded structured payload, lock-free ring-buffer write) completes in ~25 ns on Apple Silicon with zero allocations and zero contention enabling always-on detailed production instrumentation for both spans and logs.

To be clear — swift-log and swift-distributed-tracing are the right choice for most Swift applications. We specifically needed always-on instrumentation for production workloads where even a NoOpTracer at ~243 ns per withSpan is measurable, and where StreamLogHandler with two metadata fields at ~1,700 ns per call would perturb the workload. Different constraints, different trade-offs.

As part of this, we capture thread ID and CPU ID on every emit. We also capture Task identity — which Swift Task produced a given span or log entry. In concurrent Swift applications this is essential for correlating spans and logs.

We achieve this today by calling swift_task_getCurrent via @_silgen_name and using the returned pointer as an opaque identity token. The pointer is never dereferenced — it's purely a cache key for a per-thread TLS lookup. On cache hit (steady state), the cost is ~2 ns. On cache miss (task transition on a thread), we read Task.name (SE-0469) once, register the name in a compact table, and all subsequent emits from that task use a cached slot ID.

This approach works well and the Task identity feature has been very valuable for production diagnostics. But we're relying on a reserved runtime symbol, and this is clearly unsupported and will eventually become an error:

  warning: symbol name 'swift_task_getCurrent' is reserved for the Swift runtime and cannot be directly referenced without causing unpredictable behavior; this will become an error

Why the public APIs don't work here

The existing public surface for accessing the current Task is withUnsafeCurrentTask { }, which uses a closure. We benchmarked every plausible approach on an M4 Max (using package-benchmark):

Approach	p50 (ns)	Retired instructions
`swift_task_getCurrent` (raw runtime call)	2	31
`withUnsafeCurrentTask { }` (noop closure)	6	120
`withUnsafeCurrentTask` + `unsafeBitCast` to extract identity	39	382
`withUnsafeCurrentTask` + `hashValue`	60	534
Full miss: `hashValue` + `Task.name`	109	1,170

The cheapest public-API path to get a comparable identity token is ~20x more expensive than the raw runtime call. When the entire span capture budget is ~25 ns, adding 40–60 ns for task identity alone is not viable — it would more than triple the cost of every emit.

For context, reading a hardware timestamp is a few ns, and reading the thread ID costs ~1 ns. Task identity should preferably be in the same ballpark.

What would help

A non-closure API that returns a cheap, stable task identifier. Something like:

extension Task {
    /// A lightweight identifier for the current task, or nil if not
    /// in a task context. Unique for the task's lifetime. O(1), no allocation.
    public static var currentIdentifier: Task.Identifier? { get }
}

Requirements

~1–5 ns — comparable to a TLS read
Non-allocating — no ARC traffic
Stable — same value for the task's lifetime
@inlinable-friendly — so libraries can inline it into client call sites

It does not need to be globally unique (pointer reuse after task deallocation is fine — same as thread IDs), and it does not need to carry semantic meaning beyond equality comparison.

Broader applicability

While our specific use case is high-performance tracing, I think a cheap task identifier would be useful more broadly:

Profilers and performance tools that need to attribute work to tasks
Lock-free data structures that use task identity for ownership tracking
Diagnostic logging that wants to tag output with task identity without the overhead of the closure-based API
Custom executors that need to track which task is running

The underlying runtime primitive (swift_task_getCurrent) already exists and does exactly the right thing. The ask is really about making it — or something equivalent — part of the supported public API surface.

I've filed a GitHub issue (Provide a cheap, stable Task identity primitive for high-performance instrumentation · Issue #89030 · swiftlang/swift · GitHub) to track this also. Happy to provide more data or discuss trade-offs.

Joakim

ktoso · May 12, 2026, 11:56am

Thanks for the chat earlier and the great writeup! I think that’s a valid use case and I can see more high performance use cases like this appear, and want to do similar things as instruments would be doing etc.

I’m sad the with unsafe task APIs closure is costing us much here my preferred way to expose this would have been on that…

I’ll experiment a bit but overall I think I’d be supportive to finding a way to get this exposed.

For what it’s worth the existing sil name approach will likely continue to work… we’ve not really set a hard deadline when any of those calls will start getting blocked, and even then you could acces it though a c header, nasty, but not really “wrong” tbh… since that method is Abi and won’t be going away…

Thank you for the thoughtful writeup!

hassila · May 12, 2026, 12:17pm

For the record, we did the C header first, but IIRC we had a performance hit then of ~5-10ns extra due to not being able to @inline properly (library evolution issue perhaps, we support building with it as we need that too, but it's opt-in).

FranzBusch · May 12, 2026, 12:18pm

Have you explored if the task_id that is a monotonically increasing counter across every task would be sufficient for your identity requirements? Sometime back I was thinking if we could/should expose this a property on Task.id potentially.

hassila · May 12, 2026, 12:20pm

No, haven't tried that -- sounds like it also would fullfil the requirements just fine. Basically anything that is stable for the tasks lifetime would be ok (and it sounds like task_id would be stable across process lifetime even) - as long as we are able to inline it properly to the client side...

FranzBusch · May 12, 2026, 12:23pm

You can try it out by doing this

@_silgen_name("swift_task_getJobTaskId")
internal func _getJobTaskId(_ job: UnownedJob) -> UInt64

That has the same problem as you mentioned above that it is not stable public API but I think it is reasonable to think about exposing this as a real public API.

hassila · May 12, 2026, 12:34pm

Quick test and it looks viable, same performance envelope, but stable over process lifetime presumably - so exposing that (in an inlinable way...) would be an option for sure.

ktoso · May 12, 2026, 1:50pm

The id would be fine to expose, we show them in outputs already in swift-inspect and instruments etc. There’s no real reason to hide them. They wrap around on overflow afair which would be the only additional consideration, but I’ll have to double check that

FranzBusch · May 12, 2026, 1:53pm

AFAIK the id is backed by a UInt64, I would be very surprised if anyone ever gets this to overflow.

hassila · May 12, 2026, 2:03pm

Yeah, it is a UInt64 returned - I hope my future M9 Ultra will be able to get them wrap, that would be awesome

David_Smith · May 12, 2026, 5:51pm

I'd be somewhat interested in the nature of the overhead for the slower approaches as well. Is the unsafeBitCast cost due to the metadata lookup for the target type or something?

ktoso · May 13, 2026, 4:05am

It's a bit nuanced, so I had to double check what we have currently. The job id is 32bit, but indeed we moved the task id to be "64bit" at some point -- internals wise it actually is two 32bit fields, which is what I was remembering here and wanted to double check.

Luckily though, it is just a single counter just stored into two fields like that for layout reasons -- so yeah, effectively 64bit, with the exception that we skip some values to avoid a job id being entirely 0.

I would be happy enough to expose this as is, especially because those IDs already surface in other tools like instruments or swift-inspect.

I suspect @Mike_Ash also wouldn't have anything against that.

John_McCall · May 13, 2026, 5:07am

Yeah, I think those fields could reasonably be made ABI. I would like to clean up the layout on the targets that aren’t yet stable ABI, though (e.g. to make them not split), so it’d be nice if there wasn’t too much code hard-coding them right now. If this were a stdlib operation, which I think would be reasonable, we could at least keep the hard-coding within the Swift implementation.

hassila · May 13, 2026, 8:08am

I had Claude analyse the generated code:

withUnsafeCurrentTask + unsafeBitCast (38-42 ns, ~60+ instrs in closure body)

The closure body reveals the real cost:

  bl   ___swift_instantiateConcreteTypeFromMangledNameV2  ; ← type metadata lookup!
  bl   ___chkstk_darwin            ; ← stack probe for dynamic alloca
  bl   _$sSctSgWOc                 ; ← value witness: copy UnsafeCurrentTask?
  bl   _$sSctMa                   ; ← UnsafeCurrentTask metadata accessor again
  blr  x8                         ; ← VWT: tag check (is it .some or .none?)
  ; ... branch on tag ...         
  ldr  x25, [x22]                 ; ← extracting payload
  blr  x8                         ; ← VWT: destroy (consumes the witness)

The unsafeBitCast itself is free (it's just ldr x25, [x22] — a register load). The entire 38 ns comes from:

__swift_instantiateConcreteTypeFromMangledNameV2 — resolves UnsafeCurrentTask? type metadata from its mangled name string. This does a cache lookup (fast on
second call) but still costs ~5-8 ns per invocation.
__chkstk_darwin — dynamic stack probe. The closure has a dynamically-sized stack frame because UnsafeCurrentTask? is accessed through value witness tables
(variable-size type layout). The compiler can't statically size the frame.
Value witness copy (WOc) — copies the UnsafeCurrentTask? existential into the closure's stack frame. Goes through metadata → VWT → indirect call. Another
~5-8 ns.
Tag check via VWT — .some/.none discrimination goes through an indirect call (blr x8) into the value witness table's getEnumTag instead of a direct cbz.
Value witness destroy (WOh) — cleanup after the closure returns. Another indirect call.

So the answer to the question is: no, unsafeBitCast costs nothing — the overhead is entirely from withUnsafeCurrentTask's generic closure protocol. The closure forces the compiler through the value witness table for UnsafeCurrentTask?, which means:

Mangled-name type metadata resolution
Dynamic stack allocation with stack probe
Indirect calls for copy/tag-check/destroy through the VWT

I guess the main point is:

withUnsafeCurrentTask is a generic function — body:
(UnsafeCurrentTask?) throws -> T — so the compiler emits an existential box for the closure.

ktoso · May 15, 2026, 12:48am

Updating the thread that I'm happy to take the implementation on here, including benchmarks and making sure it's the best possible and availability woes etc.

If you or someone on your team would like to help out writing a proposal that would be neat @hassila

hassila · May 15, 2026, 8:13am

Thanks! Yes of course, I'll write it up.

tera · May 15, 2026, 8:56am

What happens when you have a cache miss, get the task identity and name, cache that name, so that the further lookups are cache hit, then at some point the task is deallocated a new task with a different name is created in its place (same identity)?

hassila · May 15, 2026, 8:57am

It seems the actual implementation now will use a monotonously increasing integer that is stable over the process runtime, so it won't be an issue.

tera · May 15, 2026, 10:30am

So, "pointer reuse" semantics – if Task identity followed that – would not be suitable for your purposes, right?

hassila · May 15, 2026, 12:08pm

Right, it could lead to incorrect annotations so a process stable identifier is preferred.