[Pitch] Cheap Task identity for high-performance instrumentation

Hi everyone,

I'd like to open a discussion about adding a lightweight, public API for reading the current Swift Task's identity — something analogous to pthread_self() for threads, but for Tasks.

Background

We've been building a high-performance, always-on span capture and structured logging library for Swift (not yet open source, but will be eventually). The design targets workloads where instrumentation is on the critical path — real-time systems, trading infrastructure, game engines, audio pipelines — and where the overhead of swift-distributed-tracing or swift-log would be measurable. A full span capture (timestamp, thread ID, CPU ID, SBE-encoded structured payload, lock-free ring-buffer write) completes in ~25 ns on Apple Silicon with zero allocations and zero contention enabling always-on detailed production instrumentation for both spans and logs.

To be clear — swift-log and swift-distributed-tracing are the right choice for most Swift applications. We specifically needed always-on instrumentation for production workloads where even a NoOpTracer at ~243 ns per withSpan is measurable, and where StreamLogHandler with two metadata fields at ~1,700 ns per call would perturb the workload. Different constraints, different trade-offs.

As part of this, we capture thread ID and CPU ID on every emit. We also capture Task identity — which Swift Task produced a given span or log entry. In concurrent Swift applications this is essential for correlating spans and logs.

We achieve this today by calling swift_task_getCurrent via @_silgen_name and using the returned pointer as an opaque identity token. The pointer is never dereferenced — it's purely a cache key for a per-thread TLS lookup. On cache hit (steady state), the cost is ~2 ns. On cache miss (task transition on a thread), we read Task.name (SE-0469) once, register the name in a compact table, and all subsequent emits from that task use a cached slot ID.

This approach works well and the Task identity feature has been very valuable for production diagnostics. But we're relying on a reserved runtime symbol, and this is clearly unsupported and will eventually become an error:

  warning: symbol name 'swift_task_getCurrent' is reserved for the Swift runtime and cannot be directly referenced without causing unpredictable behavior; this will become an error

Why the public APIs don't work here

The existing public surface for accessing the current Task is withUnsafeCurrentTask { }, which uses a closure. We benchmarked every plausible approach on an M4 Max (using package-benchmark):

Approach p50 (ns) Retired instructions
swift_task_getCurrent (raw runtime call) 2 31
withUnsafeCurrentTask { } (noop closure) 6 120
withUnsafeCurrentTask + unsafeBitCast to extract identity 39 382
withUnsafeCurrentTask + hashValue 60 534
Full miss: hashValue + Task.name 109 1,170

The cheapest public-API path to get a comparable identity token is ~20x more expensive than the raw runtime call. When the entire span capture budget is ~25 ns, adding 40–60 ns for task identity alone is not viable — it would more than triple the cost of every emit.

For context, reading a hardware timestamp is a few ns, and reading the thread ID costs ~1 ns. Task identity should preferably be in the same ballpark.

What would help

A non-closure API that returns a cheap, stable task identifier. Something like:

extension Task {
    /// A lightweight identifier for the current task, or nil if not
    /// in a task context. Unique for the task's lifetime. O(1), no allocation.
    public static var currentIdentifier: Task.Identifier? { get }
}

Requirements

  • ~1–5 ns — comparable to a TLS read
  • Non-allocating — no ARC traffic
  • Stable — same value for the task's lifetime
  • @inlinable-friendly — so libraries can inline it into client call sites

It does not need to be globally unique (pointer reuse after task deallocation is fine — same as thread IDs), and it does not need to carry semantic meaning beyond equality comparison.

Broader applicability

While our specific use case is high-performance tracing, I think a cheap task identifier would be useful more broadly:

  • Profilers and performance tools that need to attribute work to tasks
  • Lock-free data structures that use task identity for ownership tracking
  • Diagnostic logging that wants to tag output with task identity without the overhead of the closure-based API
  • Custom executors that need to track which task is running

The underlying runtime primitive (swift_task_getCurrent) already exists and does exactly the right thing. The ask is really about making it — or something equivalent — part of the supported public API surface.

I've filed a GitHub issue (Provide a cheap, stable Task identity primitive for high-performance instrumentation · Issue #89030 · swiftlang/swift · GitHub) to track this also. Happy to provide more data or discuss trade-offs.

Joakim

19 Likes

Thanks for the chat earlier and the great writeup! I think that’s a valid use case and I can see more high performance use cases like this appear, and want to do similar things as instruments would be doing etc.

I’m sad the with unsafe task APIs closure is costing us much here :thinking: my preferred way to expose this would have been on that…

I’ll experiment a bit but overall I think I’d be supportive to finding a way to get this exposed.

For what it’s worth the existing sil name approach will likely continue to work… we’ve not really set a hard deadline when any of those calls will start getting blocked, and even then you could acces it though a c header, nasty, but not really “wrong” tbh… since that method is Abi and won’t be going away…

Thank you for the thoughtful writeup!

3 Likes

For the record, we did the C header first, but IIRC we had a performance hit then of ~5-10ns extra due to not being able to @inline properly (library evolution issue perhaps, we support building with it as we need that too, but it's opt-in).

Have you explored if the task_id that is a monotonically increasing counter across every task would be sufficient for your identity requirements? Sometime back I was thinking if we could/should expose this a property on Task.id potentially.

No, haven't tried that -- sounds like it also would fullfil the requirements just fine. Basically anything that is stable for the tasks lifetime would be ok (and it sounds like task_id would be stable across process lifetime even) - as long as we are able to inline it properly to the client side...

You can try it out by doing this

@_silgen_name("swift_task_getJobTaskId")
internal func _getJobTaskId(_ job: UnownedJob) -> UInt64

That has the same problem as you mentioned above that it is not stable public API but I think it is reasonable to think about exposing this as a real public API.

2 Likes

Quick test and it looks viable, same performance envelope, but stable over process lifetime presumably - so exposing that (in an inlinable way...) would be an option for sure.

The id would be fine to expose, we show them in outputs already in swift-inspect and instruments etc. There’s no real reason to hide them. They wrap around on overflow afair which would be the only additional consideration, but I’ll have to double check that

AFAIK the id is backed by a UInt64, I would be very surprised if anyone ever gets this to overflow.

Yeah, it is a UInt64 returned - I hope my future M9 Ultra will be able to get them wrap, that would be awesome :slight_smile:

1 Like

I'd be somewhat interested in the nature of the overhead for the slower approaches as well. Is the unsafeBitCast cost due to the metadata lookup for the target type or something?

It's a bit nuanced, so I had to double check what we have currently. The job id is 32bit, but indeed we moved the task id to be "64bit" at some point -- internals wise it actually is two 32bit fields, which is what I was remembering here and wanted to double check.

Luckily though, it is just a single counter just stored into two fields like that for layout reasons -- so yeah, effectively 64bit, with the exception that we skip some values to avoid a job id being entirely 0.

I would be happy enough to expose this as is, especially because those IDs already surface in other tools like instruments or swift-inspect.

I suspect @Mike_Ash also wouldn't have anything against that.

2 Likes

Yeah, I think those fields could reasonably be made ABI. I would like to clean up the layout on the targets that aren’t yet stable ABI, though (e.g. to make them not split), so it’d be nice if there wasn’t too much code hard-coding them right now. If this were a stdlib operation, which I think would be reasonable, we could at least keep the hard-coding within the Swift implementation.

2 Likes

I had Claude analyse the generated code:

withUnsafeCurrentTask + unsafeBitCast (38-42 ns, ~60+ instrs in closure body)

The closure body reveals the real cost:

  bl   ___swift_instantiateConcreteTypeFromMangledNameV2  ; ← type metadata lookup!
  bl   ___chkstk_darwin            ; ← stack probe for dynamic alloca
  bl   _$sSctSgWOc                 ; ← value witness: copy UnsafeCurrentTask?
  bl   _$sSctMa                   ; ← UnsafeCurrentTask metadata accessor again
  blr  x8                         ; ← VWT: tag check (is it .some or .none?)
  ; ... branch on tag ...         
  ldr  x25, [x22]                 ; ← extracting payload
  blr  x8                         ; ← VWT: destroy (consumes the witness)

The unsafeBitCast itself is free (it's just ldr x25, [x22] — a register load). The entire 38 ns comes from:

  1. __swift_instantiateConcreteTypeFromMangledNameV2 — resolves UnsafeCurrentTask? type metadata from its mangled name string. This does a cache lookup (fast on
    second call) but still costs ~5-8 ns per invocation.
  2. __chkstk_darwin — dynamic stack probe. The closure has a dynamically-sized stack frame because UnsafeCurrentTask? is accessed through value witness tables
    (variable-size type layout). The compiler can't statically size the frame.
  3. Value witness copy (WOc) — copies the UnsafeCurrentTask? existential into the closure's stack frame. Goes through metadata → VWT → indirect call. Another
    ~5-8 ns.
  4. Tag check via VWT — .some/.none discrimination goes through an indirect call (blr x8) into the value witness table's getEnumTag instead of a direct cbz.
  5. Value witness destroy (WOh) — cleanup after the closure returns. Another indirect call.

So the answer to the question is: no, unsafeBitCast costs nothing — the overhead is entirely from withUnsafeCurrentTask's generic closure protocol. The closure forces the compiler through the value witness table for UnsafeCurrentTask?, which means:

  • Mangled-name type metadata resolution
  • Dynamic stack allocation with stack probe
  • Indirect calls for copy/tag-check/destroy through the VWT

I guess the main point is:

withUnsafeCurrentTask is a generic function — body:
(UnsafeCurrentTask?) throws -> T — so the compiler emits an existential box for the closure.

3 Likes

Updating the thread that I'm happy to take the implementation on here, including benchmarks and making sure it's the best possible and availability woes etc.

If you or someone on your team would like to help out writing a proposal that would be neat @hassila :slight_smile:

1 Like

Thanks! Yes of course, I'll write it up. :slight_smile:

1 Like

What happens when you have a cache miss, get the task identity and name, cache that name, so that the further lookups are cache hit, then at some point the task is deallocated a new task with a different name is created in its place (same identity)?

It seems the actual implementation now will use a monotonously increasing integer that is stable over the process runtime, so it won't be an issue.

2 Likes

So, "pointer reuse" semantics – if Task identity followed that – would not be suitable for your purposes, right?

Right, it could lead to incorrect annotations so a process stable identifier is preferred.

1 Like