I'm working on instrumenting some code with the Swift implementation OpenTelemetry.
OTel-Swift works great with async/await - but it doesn't work great with NIO.
Under the hood, OTel-Swift does some magic to figure out the parent span of a particular function call based on the current thread. This paradigm doesn't work with NIO since a single thread can execute separate call stacks.
I'm wondering if anyone in the community has had any luck instrumenting NIO code (for a production environment). Ideally, I'd like to use OTel-Swift - but I'm open to alternatives.
swift-distributed-tracing might be a viable option if anyone is aware of a way it can integrate with NIO.
EDIT: I'm trying to avoid having to drill spans through every function call I want to trace. I could get all this to work fine if I did that though.
NOTE: For the primary reason that there's a separation there with the library to enable the API vs. the implementation of the API that the library at GitHub - open-telemetry/opentelemetry-swift: OpenTelemetry API for Swift doesn't provide. If you're cobbling library code and use that library, it forces the dependencies down on any consumers.
The hummingbird project has that built in, especially as NIO has been embracing more and more async/await first APIs, so in terms of examples of how to attack some of the complicated bits, that might be a place to look for ideas. That said, I've not tried applying it directly to raw NIO code APIs, instead leveraging libraries such as Vapor or Hummingbird - "standing on the shoulders" as it were, and getting the advantage from it.
@Joseph_Heck 's suggestion is great. What exactly are you doing with NIO? It may be easiest to "escape" to structured concurrency world with NIOAsyncChannel and then use swift-distributed-tracing.
Currently, I've got a lot of signatures returning EventLoopFutures and I'm map/flatMaping them together.
The implicit tracing doesn't work well across the calls to map since the provided closure isn't executed synchronously and isn't isolated from other submissions to the EventLoop.
For example:
func foo() -> EventLoopFuture<Void> {
// Implicit parent span of `bar` is `foo`'s span
bar().flatMap {
bar() // Implicit parent span of this `bar` is unknown.
}
}
func bar() -> EventLoopFuture<Void> { /* ... */ }
Both times bar is called, its parent span should be foos span, however, that is only possible with the first call to bar().
Perhaps the answer here is just to stop using map/flatMap and just convert all this to async/await with get()?
When I say "implicit span/tracing" I'm referring to this activeSpan magic that OTel-Swift does to enable automatic span creation without having to pass one into every traced function. See: Instrumentation | OpenTelemetry
(Looks like swift-distributed-tracing has something very similar - they both leverage TaskLocal to manage context lookups)
The distributed-tracing way to handle "non swift-async" code is to get the ServiceContext from the span and carry it around manually, you can then start a new span by passing the context to a new withSpan and it'll attach to the "parent" identified by the service context. We on purpose don't recommend passing around spans but the context.
You could do that and use swift-otel.
We're interested in better collaboration with opentelemetry-swift but we've not yet figured out how to engage. In general in Swift libraries it's best to stick to the "most general API" which is distributed-tracing, but if you're in an app, or making extension points somewhere you could be using opentelemetry-swift directly if you truly needed to.
Maybe this helps to steer in the right direction a bit.
In general you'll have to "get the context, and carry it around using some other means, and then restore it". Or indeed, just go all-in on async/await when things should just workβ’