Should libraries use logs or span events?

Scott · September 30, 2022, 2:22pm

swift-distributed-tracing is planning to add swift-log support.

Part of the interesting thing about this is that distributed tracing already has span events. Conceptually, span events are the same thing as structured logs that are tied to spans/traces.

As @ktoso pointed out, adding swift-log to distributed tracing is, at the very least, important for getting logs from libraries that do not supported distributed tracing but do support swift-log.

If I'm building a library to support a wide range of consumers, should I use span events or logs?

Examples are hard to come up with so bear with me here.

Let's say I'm building an a Circuit Breaker library and you hit your success count threshold to half-open the circuit. You might want to log an event that the half-open threshold has been reached, and upon completion of the executed code, another event for if the circuit breaker will be fully opened or closed again. Should these be a span events? Or a logs?

ktoso · October 1, 2022, 12:51am

Thanks for moving the thread over to forums! Now I can give it the detailed reply the topic deserves

Let's start out with the obligatory "it depends" , but let's immediately clarify what that means in this context and what options are available:

Span events vs. logs

For this discussion let's use the otel definition of span events (Manual Instrumentation | OpenTelemetry):

An event is a human-readable message on a span that represents “something happening” during it’s lifetime. For example, imagine a function that requires exclusive access to a resource that is under a mutex.

which you very rightfully identify as "hey, isn't that just logging?".

Let's use your circuit breaker example (great library idea btw, would be great to have), and one could imagine the following events/logs:

...("breaker open!")
// ... 
...("breaker half-open")

Of course we'd want to know for which request the breaker was tripped, and include some more information about it... We can do this either with swift-log:

log.info("breaker open!", metadata: ["trace-id": ..., "error-rate-limit": ...])
// ... 
log.info("breaker half-open", metadata: ["trace-id": ..., "error-rate-limit": ...])

or with tracing events:

span.addEvent(.init("breaker open", attributes: ...))
// ... 
span.addEvent(.init("breaker half-open", attributes: ...))

and that's honestly almost the same.

So where is the difference? In how end users consume this information.

End users

When developing a library that does logging or events, you have to think about how people will be using it. So in this example... I suspect we want the library to be useful even if end-users are not using distributed tracing -- it might be good enough to have a library log info [error-rate-limit: ...] breaker [name] open! for many users.

But when more advanced users, who DO use tracing use this library, then they'd most likely want those to be events, such that they show up in their trace spans and UIs when they browse them.

So there's different kinds of users, and a library author must decide who we care about. Most likely the answer is "everyone". So how do we make our library optimally usable for all kinds of users?

Instrumenting your libraries

And in order to prove that we don't just talk the talk, but also walk the walk, here is how this is implemented in the distributed actors cluster: https://github.com/apple/swift-distributed-actors/blob/main/Sources/DistributedActors/Instrumentation/ReceptionistInstrumentation.swift#L18-L28

protocol _ReceptionistInstrumentation: Sendable {
    init()

    func actorSubscribed(key: AnyReceptionKey, id: ActorID)

    func actorRegistered(key: AnyReceptionKey, id: ActorID)
    func actorRemoved(key: AnyReceptionKey, id: ActorID)

    func listingPublished(key: AnyReceptionKey, subscribers: Int, registrations: Int)
}

where the default implementation does nothing; another implementation emits os_signposts, and yet another could just emit swift-metrics rather than log. Or if you wanted to log those, you could - by passing an implementation that does that.

Looping this pattern back to the circuit breaker:

protocol CircuitBreakerInstrumentation: Sendable { 
  func breakerOpened(...)
  func breakerHalfOpened(...)
  func breakerClosed(...)
}

// ----

// when the breaker becomes open, call the instrumentation:
self.instrument.breakerOpen(... any interesting data ...)
// ... 
self.instrument.breakerHalfOpen(... any interesting data ...)

And you could of course by default have the library configure a logging instrument, and make available a tracing one as well:

myBreakerLib.settings.instrumentation = .log
// or
myBreakerLib.settings.instrumentation = .trace
// or
myBreakerLib.settings.instrumentation = MyFancyInstrumentation()

This way, every user can do exactly what they want with those events.

What to do by default?

By default arguably logging is the best, because it is the simplest and does not require any additional tracing collectors to view.

Note: probably the a lot of libraries are perfectly well off with just logging. Without these advanced patterns, but since we're talking about a library like circuit breakers, it's probably worth doing the extra effort.

So you could offer this instrumentation, and enable logging by default.

When is it worth it?

So this is a bit of overhead to design and maintain this infra, but actually it isn't so hard once you get used to it.

It definitely is more work than just logging in-line though. So when should one do this? Most likely in such very low level and very reusable libraries -- a circuit breaker library I'd probably make the extra effort and make it so configurable.

Other libraries can probably just stick to logging. And note also that thanks to distributed tracing and swift lot integration log messages can be correlated to traces automatically as well -- they will include the trace-id!

If an end user wanted to have all log messages as trace events, they could build a LogHandler which does just that: take the message, and make a SpanEvent for it. Though arguably we might get into efficiency debates then -- it might not be the most efficient way to do this (the instrumentation approach is).

So that'd be my 2c on the topic. It really depends on the library but the more anticipating it wants to be for such uses the better prepared it can be.

Scott · October 1, 2022, 2:19pm

Thanks for this very detailed response! This answers all of my questions, in a much more complete way than I expected.

Of course the answer is
why-not-both-why-not
(sometimes )