Server Distributed Tracing

ktoso · June 12, 2020, 3:35am

Thanks everyone for chiming in!

I'll fill in some additional info on a few of the questions that were asked so far:

@slashmo already answered this well I think, but to expand on our thought process here a bit more:

The choice of words here is not accidental and implies a few "layers" at which tools and implementations exist. The work we're doing with baggage here is at the very "bottom" of all possible cross-cutting tool implementations. I could not phrase it better myself, so I'll quote the tracing plane paper:

Despite their demonstrated usefulness, organizations report that they struggle to deploy cross-cutting tools. One fundamental reason is that they require modiﬁcations to source code that touch nearly every component of distributed systems. But this is not the only reason. To see why, we can break up cross-cutting tools into two orthogonal components:

ﬁrst (1) is the instrumentation of the system components to propagate context;

second (2) is the tool logic itself.

... what the metadata is, such as IDs, tags, or timings, and when it changes – depends on the tool (2),
while the context propagation – through threads, RPCs, queues, etc. – only depends on the structure of the instrumented system and its concurrency (1)

In other words, a Tracer is a specific tool, while instrumentation is the not tool specific "carry these values please" part of the system.

We specifically choose to talk about "cross-cutting tools" on this layer, but perhaps it's too abstract without more examples of what we actually mean, so here's a few examples of what could be implemented as baggage instruments but is not "directly" distributed tracing:

deadlines
- in a multi-service system, where a request has to go "through" n services from the edge, and the edge has a strict SLA of "must reply within 200ms", we may want to carry around the deadline value with the requests as they are propagated to downstream systems. If a system receives a request and the deadline is exceeded (wallclock time makes this tricky in dist-systems of course, but bear with me) we know that the upstream has already replied with "well, request timed out" so there is no reason to even start working on the request in the downstream service, so we can drop it.
- This is a pattern built-in to Go's Context as well as gRPC Deadlines [1]
- I've also heard about some developers wanting to do a TTL in terms of "how many hops a request makes before we abort it"; Such instrument would get a ttl-hops counter and keep decrementing it at each remote hop a workload causes.
resource utilization / analysis / management
- in multi-tenant environments if may be useful to capture congestion of resources and "who is responsible for this overload".
- Again, is not exactly tracing, but very similar to it, and needs the same kinds of baggage context propagation. Say we want to group all "work" caused by a "request" made by a Client, and give it some quota, and if that client exceeds it's quota, we want to de-prioritize serving it, because it's badly behaved and starving well-behaved clients for resources. A simple example would be Client calls [A calls B calls C] and exceeds its allocated quota (however we measure that...) on service C; since Client always enters the system on A, we'll want to tell A that "hey, that Client is not well behaved, throttle it a bit". But we can only do this if we can track
- one example of such system is Retro [2]
authentication, delegation / access control / auditing
- This is not an area I'm an expert on but does come up as another use-case of such instruments; It feels right, since usually these also mean carrying along the execution of some task some identity information. I do not encourage building ad-hoc security if anyone ever gets to this, there's plenty literature about it, our only hope is that if such system needs to carry metadata, it should be able to use the same instrumentation "points" as tracers would.
- Baggage can be used to carry around information "on who's behalf" we are performing actions and similar, which can be used for auditing etc.
- The Universal Context Propagation for Distributed System Instrumentation [3] paper lists a number of such use cases, but I'm not familiar enough to say much about them.

The word "Tracer" is bound to appear in implementations, but it is slightly different than the instruments I believe.

[1] gRPC and Deadlines | gRPC
[2] https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/mace
[3] https://people.mpi-sws.org/~jcmace/papers/mace2018universal.pdf

Yes, I think it's quite likely we'll end up with a similar shape to these.

Same as with metrics, most libs will be fine to just call "whatever the current tracer/instrument is", while some tracers may need to offer (very likely) user level APIs for folks who want to explicitly start/stop spans in their code. In that case they'd get and specifically call CoolTracer.[...] things (another reason we don't necessarily want to use up the word tracer just yet.

We've already seen this pattern with e.g. SwiftPrometheus [4], where for most things the generic "just use the global one" is fine (esp. in frameworks, since you can't assume which metrics backend users will run with). But users may want to utilize some extra features Prometheus offers, which in this example is "help texts" [4] which are prom specific so if one wants to use those, one cannot use the generic instrument but would reach for the PrometheusClient metric factories.

We're expecting the same to happen with tracer implementations where some impls may have some specific features which don't fully fit into the instruments' APIs.

[4] GitHub - swift-server/swift-prometheus: Prometheus client library for Swift

Great catch, thanks: ticketified Rename BaggageContext module to Baggage · Issue #26 · slashmo/gsoc-swift-tracing · GitHub

I think it's very important we do not tie ourselves to HTTP as it opens up very interesting future development potential.

There's two main categories here I suppose:

First, "some binary protocol that is not HTTP" for whatever reason those exist, and we also want to carry metadata in them. It could be local XPC services, some other IPC mechanisms, or it could be some custom binary wire protocols (e.g. RSocket [5]) that also have metadata fields but they are not specifically "HTTPHeaders".

[5] http://rsocket.io/docs/Protocol#frame-header-format We're not specifically looking at rsocket, but it's a good example for making the case

Second, databases! We might be able to trace "into" Cassandra requests [6] (ancient post, but has nice pictures ), meaning that you would not only get the trace of your HTTP calls, futures, but also where the time is being spent inside the database to serve your request. This can allow noticing some problems with your schemas, indexes etc. It's a bit of a power user feature, but again -- we want this BaggageContext to be ubiquitous meaning that a database client could expose APIs which allow this use case by doing fetch(query, context: context). There the inject() implementation would take some form of statement.setOutgoingPayload("zipkin", ...) or similar, so z ZipkinCassandraInstrument would do just that, and some other tracer would set some other things there, while the Cassandra client library simply knows "there is some Instrument<CassandraStatement, ...> I will call here".

[6] Replacing Cassandra's tracing with Zipkin

We can do the same with any client library just "on the client" though that would not really require the inject/extract calls... We'll get to these use cases soon enough I think, It is likely this may need a generic tracer type. There's folks who instrument Redis but it's all "on the client" which is also valuable.

Yup that's the same usage-style we'd envision here (and is possible today).

tanner0101:

    BaggageContext.current[SomeKey.self] = someValue
[...]
The key benefit to static access being that you don't need to "clutter" your API with context passing. Achieving static context passing in a one-thread-per-request based framework is fairly straightforward since you can use thread storage. Doing it in an event loop design like NIO is more complex. [...]

FWIW, I'm not personally very invested in this type of context passing. Vapor opts for making it easier to pass things explicitly instead. I'm just interested if you planned on addressing it as a part of this proposal or had any thoughts.

Yeah that's one of those "looks great in small snippet, completely breaks down in complex system" things... Having that said, it definitely is very helpful sometimes.

My personal hope is that we can aim for explicit passing as it's less error prone, and make it pleasant-enough. Yet for cases where a framework really would like to do the .current style there should be some way to do so. Not sure on what type those would exist but there is some prior art for this in Rust's tracing crate (where a tracing provider can implement .current, but all APIs require passing a context in, thus if a user uses the implicit passing, they can always summon it when they hit an API they should propagate the context to: request(url, context: .current /* summon */).

I want to be very cautious about it though, because getting implicit passing around using TLs right with highly async frameworks is notoriously difficult to get right and hard to debug (resulting in "dropped traces" which are a nightmare to debug) since it's not "visible" where a baggage context was "dropped" ("forgot to pass it along") so I don't think it should be the default way, but it can be an optional way for end-users perhaps...

For ThreadLocals (TLs) to work well all participants which are async have to be aware of the fact that they must store/restore the context when they are about to fire off async work, and then when they're about to actually execute it (by storing the current TL "somewhere", and then getting from "somewhere" into current thread's TL again). This sadly breaks down the moment a library is not aware of it and can break in subtle ways; TLs also have the problem of not being "scoped" so if you keep setting stuff on a TL, but forget to clear it when you're "done" you might have "polluted" the storage -- and suddenly a request without a traceID attached shows up as if it was traced and as part of the previous trace These are annoying to debug and fix... but yes, it's possible to make it pleasant if you control all the threading of an application.

So... Thread Locals have known and very annoying limitations, and also incur an annoying performance hit to access them. But there's ideas how a structured concurrency runtime could improve on the state of art here, and there's some ongoing work in Java's project Loom called Scope Variables [7] which are pretty promising. If we were in position to build something like that those would be much more interesting for baggage than jumping onto the TL train right away. It's an interesting read, give it a look

[7] State of Loom: Part 2

Whoo~... that ended up much longer than I expected, but I hope it's interesting and shines some light at how we're looking at the problem space

Please keep the feedback coming and stay involved on forums and the repository!