Both Application and Request have Storage as a stored property. This allows for the developer and third party packages to easily extend these types. This works particularly well with Swift's extensions, for example:
extension Request {
private struct Foo: StorageKey {
typealias Value = Bool
}
var foo: Bool? {
get { self.storage[Foo.self] }
set { self.storage[Foo.self] = newValue }
}
}
It functions similarly to the userInfo: [AnyHashable: Any] pattern many Apple libraries use.
Ah, ok I see. That makes sense.
Most of these conversations have been IRL. (I found an SSWG meeting note about it here but that's not super helpful).
Static context access might look something like this:
The key benefit to static access being that you don't need to "clutter" your API with context passing. Achieving static context passing in a one-thread-per-request based framework is fairly straightforward since you can use thread storage. Doing it in an event loop design like NIO is more complex. I'm not a Node.js expert, but it seems they allow static context passing in an event loop system with something called Continuation-Local-Storage. Whether or not doing this is a good idea, I don't know (http://asim-malik.com/the-perils-of-node-continuation-local-storage/).
FWIW, I'm not personally very invested in this type of context passing. Vapor opts for making it easier to pass things explicitly instead. I'm just interested if you planned on addressing it as a part of this proposal or had any thoughts.
I'll fill in some additional info on a few of the questions that were asked so far:
@slashmo already answered this well I think, but to expand on our thought process here a bit more:
The choice of words here is not accidental and implies a few "layers" at which tools and implementations exist. The work we're doing with baggage here is at the very "bottom" of all possible cross-cutting tool implementations. I could not phrase it better myself, so I'll quote the tracing plane paper:
Despite their demonstrated usefulness, organizations report that they struggle to deploy cross-cutting tools. One fundamental reason is that they require modifications to source code that touch nearly every component of distributed systems. But this is not the only reason. To see why, we can break up cross-cutting tools into two orthogonal components:
first (1) is the instrumentation of the system components to propagate context;
second (2) is the tool logic itself.
... what the metadata is, such as IDs, tags, or timings, and when it changes – depends on the tool (2),
while the context propagation – through threads, RPCs, queues, etc. – only depends on the structure of the instrumented system and its concurrency (1)
In other words, a Tracer is a specific tool, while instrumentation is the not tool specific "carry these values please" part of the system.
We specifically choose to talk about "cross-cutting tools" on this layer, but perhaps it's too abstract without more examples of what we actually mean, so here's a few examples of what could be implemented as baggage instruments but is not "directly" distributed tracing:
deadlines
in a multi-service system, where a request has to go "through" n services from the edge, and the edge has a strict SLA of "must reply within 200ms", we may want to carry around the deadline value with the requests as they are propagated to downstream systems. If a system receives a request and the deadline is exceeded (wallclock time makes this tricky in dist-systems of course, but bear with me) we know that the upstream has already replied with "well, request timed out" so there is no reason to even start working on the request in the downstream service, so we can drop it.
This is a pattern built-in to Go's Context as well as gRPC Deadlines [1]
I've also heard about some developers wanting to do a TTL in terms of "how many hops a request makes before we abort it"; Such instrument would get a ttl-hops counter and keep decrementing it at each remote hop a workload causes.
resource utilization / analysis / management
in multi-tenant environments if may be useful to capture congestion of resources and "who is responsible for this overload".
Again, is not exactly tracing, but very similar to it, and needs the same kinds of baggage context propagation. Say we want to group all "work" caused by a "request" made by a Client, and give it some quota, and if that client exceeds it's quota, we want to de-prioritize serving it, because it's badly behaved and starving well-behaved clients for resources. A simple example would be Client calls [A calls B calls C] and exceeds its allocated quota (however we measure that...) on service C; since Client always enters the system on A, we'll want to tell A that "hey, that Client is not well behaved, throttle it a bit". But we can only do this if we can track
one example of such system is Retro [2]
authentication, delegation / access control / auditing
This is not an area I'm an expert on but does come up as another use-case of such instruments; It feels right, since usually these also mean carrying along the execution of some task some identity information. I do not encourage building ad-hoc security if anyone ever gets to this, there's plenty literature about it, our only hope is that if such system needs to carry metadata, it should be able to use the same instrumentation "points" as tracers would.
Baggage can be used to carry around information "on who's behalf" we are performing actions and similar, which can be used for auditing etc.
The Universal Context Propagation for Distributed System Instrumentation [3] paper lists a number of such use cases, but I'm not familiar enough to say much about them.
The word "Tracer" is bound to appear in implementations, but it is slightly different than the instruments I believe.
Yes, I think it's quite likely we'll end up with a similar shape to these.
Same as with metrics, most libs will be fine to just call "whatever the current tracer/instrument is", while some tracers may need to offer (very likely) user level APIs for folks who want to explicitly start/stop spans in their code. In that case they'd get and specifically call CoolTracer.[...] things (another reason we don't necessarily want to use up the word tracer just yet.
We've already seen this pattern with e.g. SwiftPrometheus [4], where for most things the generic "just use the global one" is fine (esp. in frameworks, since you can't assume which metrics backend users will run with). But users may want to utilize some extra features Prometheus offers, which in this example is "help texts" [4] which are prom specific so if one wants to use those, one cannot use the generic instrument but would reach for the PrometheusClient metric factories.
We're expecting the same to happen with tracer implementations where some impls may have some specific features which don't fully fit into the instruments' APIs.
I think it's very important we do not tie ourselves to HTTP as it opens up very interesting future development potential.
There's two main categories here I suppose:
First, "some binary protocol that is not HTTP" for whatever reason those exist, and we also want to carry metadata in them. It could be local XPC services, some other IPC mechanisms, or it could be some custom binary wire protocols (e.g. RSocket [5]) that also have metadata fields but they are not specifically "HTTPHeaders".
Second, databases! We might be able to trace "into" Cassandra requests [6] (ancient post, but has nice pictures ), meaning that you would not only get the trace of your HTTP calls, futures, but also where the time is being spent inside the database to serve your request. This can allow noticing some problems with your schemas, indexes etc. It's a bit of a power user feature, but again -- we want this BaggageContext to be ubiquitous meaning that a database client could expose APIs which allow this use case by doing fetch(query, context: context). There the inject() implementation would take some form of statement.setOutgoingPayload("zipkin", ...) or similar, so z ZipkinCassandraInstrument would do just that, and some other tracer would set some other things there, while the Cassandra client library simply knows "there is some Instrument<CassandraStatement, ...> I will call here".
We can do the same with any client library just "on the client" though that would not really require the inject/extract calls... We'll get to these use cases soon enough I think, It is likely this may need a generic tracer type. There's folks who instrument Redis but it's all "on the client" which is also valuable.
Yup that's the same usage-style we'd envision here (and is possible today).
Yeah that's one of those "looks great in small snippet, completely breaks down in complex system" things... Having that said, it definitely is very helpful sometimes.
My personal hope is that we can aim for explicit passing as it's less error prone, and make it pleasant-enough. Yet for cases where a framework really would like to do the .current style there should be some way to do so. Not sure on what type those would exist but there is some prior art for this in Rust's tracing crate (where a tracing provider can implement .current, but all APIs require passing a context in, thus if a user uses the implicit passing, they can always summon it when they hit an API they should propagate the context to: request(url, context: .current /* summon */).
I want to be very cautious about it though, because getting implicit passing around using TLs right with highly async frameworks is notoriously difficult to get right and hard to debug (resulting in "dropped traces" which are a nightmare to debug) since it's not "visible" where a baggage context was "dropped" ("forgot to pass it along") so I don't think it should be the default way, but it can be an optional way for end-users perhaps...
For ThreadLocals (TLs) to work well all participants which are async have to be aware of the fact that they must store/restore the context when they are about to fire off async work, and then when they're about to actually execute it (by storing the current TL "somewhere", and then getting from "somewhere" into current thread's TL again). This sadly breaks down the moment a library is not aware of it and can break in subtle ways; TLs also have the problem of not being "scoped" so if you keep setting stuff on a TL, but forget to clear it when you're "done" you might have "polluted" the storage -- and suddenly a request without a traceID attached shows up as if it was traced and as part of the previous trace These are annoying to debug and fix... but yes, it's possible to make it pleasant if you control all the threading of an application.
So... Thread Locals have known and very annoying limitations, and also incur an annoying performance hit to access them. But there's ideas how a structured concurrency runtime could improve on the state of art here, and there's some ongoing work in Java's project Loom called Scope Variables [7] which are pretty promising. If we were in position to build something like that those would be much more interesting for baggage than jumping onto the TL train right away. It's an interesting read, give it a look
Looks good, thank you!
One question though, why BaggageContext and not just Context? Seems like a lot to type, making signatures even longer, and it will be in contrast to other languages, like go, where its called just Context. Isn't just Context a widely accepted and understood type? Or am I confusing something here?
Thanks for the question. @ktoso & I discussed this as part of Naming naming naming :-) · Issue #12 · slashmo/gsoc-swift-tracing · GitHub. Basically, we think that Context is too generic and already used in many libraries (ChannelHandlerContext, LamdaContext, and even Context itself), where the argument name often names is called context. Also, the way I imagine BaggageContext to be used is as a property (if possible) of a libraries existing Context type, instead of replacing it entirely.
First two examples do not collide in naming, were is the third one coming from? Is it coming from other context libraries?
:sad panda:
I was hoping for this to be the context library, that would be proposed to SSWG and be widely adopted like swift-log and swift-metrics? There were no problems using names like Logger and Counter, even though they were probably used in other libraries. Maybe it's actually a good thing that we use a name that is the same, it will make migration easier (this is what I would prefer for replacing library I use)
Can't we disambiguate by using module name?
Sorry for my very silly comments, but I feel like reinventing names is bad, especially if one has to work with projects in different languages...
Just saw that module is named Baggage, for some very silly reason, I reeally don't like that name :(. Though I would prefer for module to be named Baggage and have Context (disambiguated as Baggage.Context), since I'll have to use it once, and Context will be everywhere...
One other comment I have (this one is not silly, I promise ). Was dependency on NIO a requirement? Do we not expect for this to be used in non-NIO projects? Or, since its server-oriented, we expect all projects using NIO anyway?
Edit: What about naming the module Tracing and type TracingContext or just Context?
Before I dive in, it's important to keep in mind tha this is not a "pitch of the final API" and yes we're still exploring the space. There are different tradeoffs that were considered that led us here though; So yes, things are up for change, but they need to address all the considerations / cases we need to handle.
I do want to push back on "reinventing names", if anything then we're following prior art here to be honest. The Go example is somewhat of an exception to be honest, since it is the absolute core of the language, and libraries don't often offer "framework context" which many of ours do (including non SSWG where we'd also want to "attach the context to"). Go also has some notable rules about Context which I'm not sure will work out in reality for us, including "one must NEVER wrap Context in another type", given that we have existing types that would be well served to "carry a baggage/context" (examples below).
So the baggage naming follows prior art and keeps in mind what we're not likely to pull off the "Go style" of passing context around (which we are also consulting with stdlib folks).
Examples of prior art terming this type of object as "baggage":
Jaeger tracing
It is useful for manually providing some baggage items for testing purposes, which we can exploit. Alternatively, the baggage can always be explicitly set on the span inside the application by using the Span.SetBaggageItem() API.
2016 Pivot Tracing
is an excellent paper and all wording is implemented in terms of baggage propagation.
2018 Universal Context Propagation for Distributed System Instrumentation, which effectively explains the "future of context propagation in distributed systems"
It uses the term (and type) BaggageContext, and we (for now) decided to adopt it,
the linked discussion here specifically introduces and explains baggage as well here.
So... things are not set in stone, but we're trying to explore the space and see what will read and use the best in various usage patterns. We do however find the problem of "everyone already has a context object and will keep it" problematic.
Specifically, we do anticipate frequent use cases (lambda, nio, other libraries we're working on) to already have a "highly framework specific context" which may include objects which are not meant to be propagated across threads etc. I.e. "do not call this from another event loop" style values.
For library composition however, we do anticipate libraries accepting a (Baggage)Context, since that is the interop and "the carry values across threads" type. When the two meet, we anticipate the following to happen frequently:
// a framework specific context exists:
context: ChannelHandlerContext
/* or Lambda Context or Other.Context or Request (Vapor) */
// since other lib does not know about the above framework framework:
someOtherLib(param1, param2, context.baggage)
// additional sugar to accept context would be possible,
// if it conformed to some HasBaggage protocol (idea):
// protocol ???Baggage { var baggage: Baggage??? { ... } }
// someOtherLib(param1, param2, frameworkContext)
Specifically, how would we in this situation (a framework context) and a library unaware of framework context accepting a context) avoid the following:
So I think we have room here to wiggle around with the types, including existing APIs which do not want to break API. (The above example ain't far from reality, as we are likely to introduce/need a "associate some value with a channel" type, so again the ChannelHandlerContext would not "be" the (baggage)context, but it offer one).
I would argue (strongly) that BaggageContext should not be "random bag of any random stuff", including closures and non-serializable values (e.g. closures and other non-carryable-over-process/network-boundaries values). Specifically, it is unreasonable to now claim that NIO and all libraries have to express all their parameters AS (Baggage)Context, however it makes much sense to allow them to "carry the baggage" if they already have context objects.
I.e. in NIO (or Lambda, or similar "framework") use cases, it is frequent that the "framework context object" already is being passed around when necessary, so we want to avoid having to pass "two context objects" (keep in mind we may not be able to force all implementations to IS-A (Baggage)Context).
The second point is perhaps possible to be resolved in other ways but it comes to mind that we very likely will want to carry W3C TraceContext (not yet final) values (as a type) in a baggage, it makes it easier to spell and understand if not both are called context, i.e. accessing a W3C context inside a baggage would be baggage.traceContext rather than context.traceContext, and similar with other tracers baggage.zipkin.traceID etc.
The name TraceContext is problematic because we want to enable use cases which are not-just-tracing, and the naming would feel quite wrong if we tied it all to tracing using that type name.
So I agree that definitely have to keep exploring the naming here (and likely to change around more than a few times before this gets "stable"), I do strongly believe that we have to do so while implementing specific tracers and use cases in specific frameworks. Currently I think the case for let's use Context as the type name is problematic for adoption and implementation reasons listed above. We could be wrong though, and as goes with making common abstractions we do need a few real-ish implementations to really figure out what will work and what not -- that's the upcoming 2~3 months of the GSoC before us (and we're only in week 2 now)
Rest assured: The "context" project depending on NIO is a strict non-goal ... and is only accidental for the time in order to get the GSoC running and get the UseCases implemented as soon as we can.
While it is server-oriented we do envision its use in non-server scenarios as well, thus the hard zero dependency rule here.
You're right that the "context project" will be stand alone and zero dependency, because indeed we do want to use it in use-cases which do not care about NIO at all.
Summing up: yes I agree the naming needs to be flushed out, but I disagree that "just use Context"™ is the obvious answer that we don't even need to think about -- we do need to investigate more, and perhaps we'll realize it'd be possible, but currently I see a few roadblocks to achieve that.
How about we open another ticket to "revisit naming" and do so explicitly once we have at least one end-to-end use case as well as more than one "what type of metadata is being carried around" and PoCs of Tracers?
I do strongly recommend reading up on https://people.mpi-sws.org/~jcmace/papers/mace2018universal.pdf (though we can argue if it makes sense or not of course) which is a strong inspiration for this work, and highlights some reasons why "(baggage) context" just™ being a glorified dictionary (even if it is today implemented as such) is not necessarily the end goal.
Thank you! this is waay too many words for my silly concern :) I was just a bit surprised about the clashing part, I did a quick and not thorough enough search of Context libraries and didn't find that many, what libraries are you referring to when you say that other libraries use Context? (this is just curiosity, feel free to ignore it :D)
Vapor's Request and Application which both have Storagehttps://docs.vapor.codes/4.0/client/ (e.g. see how request is used to "contain" client and other information)
All of those are frameworky "I control execution semantics" and are the right/easy place for the framework to "set some headers/context i got from somewhere".
In all those cases if feels natural (IMO) to extend them to also carry baggage, as they are often passed around already anyway to achieve some framework specific task. It is only when we hit a non-framework function or other framework we'd need to pass the "generic context", which I hope we can make happen by some CarriesBaggage (name invented on the spot, let's not bikeshed it here yet)
How would we spell extracting things to avoid context.context?
(Perhaps that's possible, if we had that CarriesContext protocol , worth looking into)
Go avoids this by:
Do not store Contexts inside a struct type; instead, pass a Context explicitly to each function that needs it. The Context should be the first parameter, typically named ctx:
I.e. "never wrap", how would we then avoid having to pass function(context, context)?
You're right that type name we would not clash for those (I have another example where you extend a thing and there's a typealias Context that depends on the enclosing type... that would conflict, but that's a minor issue perhaps), it is more about the passing-around sites and parameter naming.
This discussion is pretty helpful, thanks for challenging it they way you do -- that's why we're sharing the WorkInProgress of the GSoC after all
I rescind this nit. After the explanations and related articles I agree BaggageContext is the way to go.
I agree with your reasoning here.
It does make me curious about why BaggageContext is in its own module though. If BaggageContext is more specifically for instrumentation, why not put it in the Instrumentation module?
@ktoso: Correct me if I’m wrong, but the reason we want the BaggageContext library as small as possible is that e.g. NIO could directly depend on it, without getting the whole Instrumentation lib. That way NIO could gain a built in way of passing BaggageContext through the channel pipeline, but instrumentation would still be done in separate handlers/libraries.
Yeah that was the idea so far, to keep the module as minimal as possible so there's less reason for projects to "oh but we don't want that X bit...!" complain and not adopt it.
It's a good question though, if they're so close together that it could be fine or not... Let's ticketify and see how things look and feel when we have some end to end things to look at :)
Hey everyone, here’s a small progress update from my site.
End-to-end example
We’ve now added a new example that showcases how instrumentation could possibly look like spread across multiple services. This now also lead us to start talking about how to integrate with NIO, so that the same BaggageContext can be accessed from all handlers on the same channel.
Context: Bag of random stuff or meta-data only?
In #37, we also started a discussion around whether to use the BaggageContext to store something like a shared Logger. We’re very much looking forward to everyone’s input on this.
I agree with this having been involved in a project that had exactly this model. Convenient magic but breaks down easily in a complex system, particularly if that system is expected to be extensible.
Typically for Java you need to enforce that that everyone uses the same Executor and that has obvious issues where a third-party library is creating asynchronous work. This might be fine for a relatively small eco-system like server-side Swift currently where we are all using NIO's event loops but as the eco-system gets more complex (hopefully) this will be less likely.
Also I can confirm that such issues are a nightmare as @ktoso mentioned and have definitely taken up more of my time that I would have liked them to have.
Couldn't agree more. Also with SwiftNIO Transport Services, SwiftNIO can actually run on top of Network.framework. In this case you'll use a different EventLoop type (NIOTSEventLoopGroup instead of MultiThreadedEventLoopGroup for NIO on BSD sockets) which executes on DispatchGroups. NIOTSEventLoopGroup gives you all the same synchronisation guarantees as MultiThreadedEventLoopGroupbut the underlying thread may change (as DispatchGroups aren't bound to a thread). Long story short: Even considering only SwiftNIO you cannot universally assume that thread locals always work.