The Context Passing Problem

Quite a lot going on here, good -- it's an important topic we need to find solutions for as a group and share some general style about them so the various libraries compose well :slight_smile:

Sorry for the delayed response, had some days off to recharge a little bit, now then, let's dive in :fish:


Protocol-based context passing / "FrameworkContext" types / protocols "carrying" (baggage) context

I agree that we'll want to allow and encourage protocols that carry around the "common stuff" :+1: There are some details we need to iron out though...

I've been calling this "framework contexts" that carry baggage context, seems the general idea we're thinking about the general same shape :slight_smile: There are some fine details we need to iron out here though.

It's somewhat a of a mix between the Go Context passing style, and a truly Swift unique style which we can do so well thanks to protocols. I hope we can figure out a pattern that'll work for all use-cases.

I think it'd be very useful to review and keep in mind the Holding baggage in framework protocols #9 protocols in baggage context itself as we discuss things going forward. If those are not enough (and I'm more than happy to discuss naming etc), we need to figure out what would be, specifically because all this context passing is relatively "simple" while there is one library and one concurrency model involved, but we don't have that luxury! We need to compose across various libraries which may or may not depend on eachother and may or may not have the same concurrency patterns. (More on that later)

Before diving into the types and relationships, let's highlight some of the pitfalls we can fall into and were brought up already:

Pitfall: passing logger separately from baggage context

The timing of https://github.com/swift-server/async-http-client/pull/227 is a bit unfortunate but in my countless discussions with Johannes on that we're been aware that the current API is not final -- we expected to change this API, of course it would be better without needing to... but here we are.

We should not just add the context parameter though IMHO. We can do better than that, by following a protocol context style, somewhat similar to what Tanner proposes, however using types that the BaggageContext project now contains since a while: ...Carrier protocol types.

First to highlight why the client.get("...", logger: myLogger, context: myBaggageContext()) may be slightly sub optimal.

The first annoying thing about it is the doubling of noise of course. Go and such languages avoid this by NOT passing the logger around, but storing the logger or creating it ad hoc, and always using it with passing the context to the logger (and through all operations/functions). It also means that there is no obvious relation between the logger and the baggage context -- that's bad because we need this to be predictable and boringly well known rather than have to look up the relationship.

Specifically, I think that an API that uses context will want to trace those identifiers as well, thus the logger must be made aware of the baggage context_. It must be done so efficiently, only when log statements are made etc.

This means the client has to always make this dance:

get(..., logger: Logger, context: BaggageContext) { 
    logger.with(context).info() // ok
    logger.warning()  // whoops! lost trace IDs from those logs...
    somewhere(logger: logger) // whoops! lost trace IDs from those logs...
}

Note that logger.with(context:) already exists - as and is backed by an efficient baggage aware LogHandler implemented in Holding baggage in framework protocols by ktoso ยท Pull Request #9 ยท slashmo/gsoc-swift-baggage-context ยท GitHub available via import BaggageLogging

Which is a bit repetitive. Also, it is my understanding that many developers want to avoid "pass both logger and the context," (please confirm or deny :wink:) thus, I agree we should do better than that and use "context protocols'.

Now, since HTTPClient cannot depend on Vapor (yet Vapor depends on HTTPClient), it must accept some context carrier protocol type, thankfully we have such type in BaggageContext:

/// A `BaggageContextLogging` purpose is to be adopted by frameworks which already provide a "FrameworkContext",
/// and to such frameworks to pass their context as `BaggageContextCarrier`.
public protocol LoggingBaggageContextCarrier: BaggageContextCarrier {
     /// The logger associated with this carrier context.
     ///
     /// It should automatically populate the loggers metadata based on the `BaggageContext` associated with this context object.
     ///
     /// ### Implementation note
     ///
     /// Libraries and/or frameworks which conform to this protocol with their "Framework Context" types,
     /// SHOULD implement this logger by wrapping the "raw" logger associated with this context with the `logger.with(BaggageContext:)` function,
     /// which efficiently handles the bridging of baggage to logging metadata values.
     ///
     /// Writes to the `logger` metadata SHOULD NOT be reflected in the `baggage`,
     /// however writes to the underlying `baggage` SHOULD be reflected in the `logger`.
     var logger: Logger { get set }
 }

See Holding baggage in framework protocols by ktoso ยท Pull Request #9 ยท slashmo/gsoc-swift-baggage-context ยท GitHub

:warning: Not in love with the names we found so far... they're a bit long, although descriptive... Open to ideas, please chime in here or in PR 9 / baggage context via issues.

so with those in place, we can in HTTPClient:

func get(_ url: URI, ..., context: LoggingBaggageContextCarrier) -> EventLoopFuture<Response> { 
  context.logger.info() // definitely also logs the trace ids etc
  // it also allows the framework context to hand tune the logger,
  // e.g. I only want to log "trace id, but not the other baggage information"
  // so the control over loggers remains in tact, but it's harder to get wrong, and it is easier to pass around

So users can pass any context there. We could also conform logger to the carrier I suppose... allowing context: logger? a bit weird but it would allow the simplest use case to work still ("just log")?

And Database can:

protocol Database: LoggingBaggageContextCarrier { ... }

and thus this will work, as you wish for in your initial example @tanner0101:

app.get("users") { req in 
    User.query(on: req.db).all() // yay
}

It also composes with other libraries, in case they accepted a raw BaggageContext or better any baggage carrier (e.g. they are not logging, but need to pass the context around for some reason), like so:

extension Request: LoggingBaggageContextCarrier {} 
extension Database: LoggingBaggageContextCarrier {} 

func first(context: BaggageContextCarrier) {}
func second(context: LoggingBaggageContextCarrier) {}

app.get("users") { request in 
  someLibrary(context: request)

Since Vapor is a strongly opinionated framework and wants to cut the parameter passing a lot, I think it's okey if it conforms all of its Request and similar types to context carrying. "Normal" less opinionated libraries I would strongly suggest to follow the general context passing guidelines:

// my library
func take(thing: Thing, context: BaggageContextCarrier) 
func take(thing: Thing, context: BaggageContextCarrier, callback: () -> ()) 

etc.

As for the Client wrapper... wouldn't Vapor be okey with not wrapping it then, if it could pass (optionally) the request context to requests? I assume you would like to move away from the wrapper, so that is a breaking change anyway, so the shape would become <???application???>.client.get(url, context: request) which would free Vapor from having to maintain those API wrappers? WDYT @tanner0101?


Let's visit another API from this thread:

First, I do realize that yes it is annoying to pass context around, no-one loves it, but that's in reality the only way we can get proper end to end tracing in Swift today. What API guarantees are you worried about upholding that adding an optional context parameter would such a nightmare?

(I'll comment below again why thread-locals are a no-go, as that was mentioned as well).

So let's look at:

    @inlinable
    public func zlexcount<Value: CustomStringConvertible>(
        of key: RedisKey,
        withValuesBetween range: (min: RedisZLexBound<Value>, max: RedisZLexBound<Value>)
    ) -> EventLoopFuture<Int> {
        let args: [RESPValue] = [
            .init(from: key),
            .init(bulk: range.min.description),
            .init(bulk: range.max.description)
        ]
        return self.send(command: "ZLEXCOUNT", with: args)
            .tryConverting()
    }

in order to have it be traced, we absolutely need to have a context available here, so it'd become:

    @inlinable
    public func zlexcount<Value: CustomStringConvertible>(
        of key: RedisKey,
        withValuesBetween range: (min: RedisZLexBound<Value>, max: RedisZLexBound<Value>),
        context: BaggageContext? // << added
    ) -> EventLoopFuture<Int> {
        let args: [RESPValue] = [
            .init(from: key),
            .init(bulk: range.min.description),
            .init(bulk: range.max.description)
        ]
        return self.send(command: "ZLEXCOUNT", with: args, context: context)
            .tryConverting()
    }

In other words, as @pokryfka mentions:

(Baggage) Context MUST be per operation. Why is that so? Because in a trace view, it is the operations we need to see and in order to be able to attach them to the right parents in a trace, we need to have that context available, because it contains all the trace parent identifiers etc.

As I'm not sure if everyone is on the same page how these traces look like, I quickly put together a Zipkin tracer over the Swift Tracing APIs, and here's how those end up looking like:

Now, we'd like Swift to give a best-in-class user experience when operating systems in the cloud, and a best in class tracing experience is a pre-requisite for that. And tracing is not only about HTTP calls, but also databases like Redis, for example, here's what we'll gain if all client libraries will participate in this context passing chain:

Screen Shot 2020-08-13 at 12.47.54

(Screenshots from Take OpenTracing for a HotROD ride | by Yuri Shkuro | OpenTracing | Medium -- great article / example app)

So right away, we'll be able to know "aha, during my /dispatch handling I'm hitting redis n times, and here's the "waterfall" and how I can improve my response latency etc. It is tremendously powerful and I hope the vision of all SSWG projects participating in this context passing dance and tracing instrumentation is a vision you can be excited about as well :slight_smile:

But yes, it does mean that we need this context parameter in all operations. I believe it is a small price to pay for the benefits and coherent ecosystem where our users will come to expect "of course it just works!".

Side note: There is also a difference between APIs making network calls (database drivers, http clients) and "just futures" in therms of "weight" and what needs to be instrumented by default, and what not;

For example, we would not want all NIO futures involved in a request to show up in a trace -- that'd be way to much information, and mostly useless information as well! However for futures, we would want to trace specific ones so eventLoop.traced(name, context).makePromise ... - is an excellent case for those "traced(...)." APIs, but for network APIs context propagation should be the default.

In any case, looping back to the APIs... similar to what @tanner0101 and @cory in the Redis PR linked are saying, the following:

public protocol RedisClient {
    var eventLoop: EventLoop { get }
    var logger: Logger { get }

    func send(
        command: String, with arguments: [RESPValue]
    ) -> EventLoopFuture<RESPValue>
    
    func setLogging(to logger: Logger) // not thread safe AFAIR?

changing into

public protocol RedisClient {
    var eventLoop: EventLoop { get }
    var logger: Logger { get }
    func logging(to logger: Logger)

    func send(
        command: String, with arguments: [RESPValue], 
        context: BaggageContextCarrier
    ) -> EventLoopFuture<RESPValue>

sounds good to me? It allows the baggage to be passed through the operations, and it allows the customization of the logger ad hoc...


:warning: Crazy future looking side-note / idea section :warning:

Now there still remains a lot to be done and Swift's concurrency story is not fleshed out yet, but we know a part of it will be asynchronous functions. We also know that we have property wrappers, and there have been ideas floating around for function wrappers as well ( Prepitch: function wrappers not official proposal, but great idea ).

So... if we have all libraries adopt the context passing, in a similar matter, we're in better position pitch an use-case of such wrappers that would be @Traced, and could look like this:

@Traced
func hello(context: BaggageContext?) async -> String { // allow optional? :<
  // magically does: 
  //   let span = tracer.startSpan("#function", context: context)
  //   defer { span.end() } 
  return await produceThatGreeting()
}

which would automatically trace the function and attach the span to the right context. But we need this context parameter around (or one of the ...Carrier types).

If/when a concurrency story lands in Swift... we could further discuss: okey we have this context that is "local" to this "task", is this a common thing in concurrent systems (I think it is), and could we put some sweet swifty sugar on top of it?

This cannot be solved well using thread locals (in-thread proposal about them, see also discussion in Server Distributed Tracing - #28 by tachyonics) due to how diverse and reactive the async landscape is in Swift (Combine, Dispatch, NIO, just pthreads, various OSS Future impls, ...), but...

Swift will have a concurrency story someday, then we would be able to formulate and express the need for "async local" (terrible working name) abstraction, and then we'll be able to replace our explicit context passing, with async local context passing.

// โš ๏ธ crazy speculation, no idea if we'd ever get async context, but!
// at least it is a very specific, existing, well defined thing we
// can _point at_ and solve, rather than wait.
@Traced
func hello() async -> String {
  // magically does:                      // AsyncLocal<BaggageContext>.value
  //   let span = tracer.startSpan("#function", context: .current) 
  //   defer { span.end() } 
  return await produceThatGreeting()
}

So async functions could carry around a "context" implicitly and we'd use this to attach baggage context there. Obviously, no idea if or when such thing would manifest in Swift, but the general idea is sound, and we are but one of the specific and widely deployed uses of it then. It is very useful for arguing such features to have an existing working ecosystem, and show a painpoint (the explicit context passing), that the compiler feature would address โ€“ my hope with this work is to do just that, prove the utility of tracing and justify the work on such async local capabilities some day.

Perhaps we should re-consider optional context parameters? This would make future migrations painless...

@Traced
func hello(context: BaggageContext? = nil) { 
  // ... tracer.startSpan(context ?? AsyncLocal context)
}

after all...
:warning: End of crazy future looking side-note / idea section :warning:


This looks slightly wrong, or rather, it looks like a "mix" of things:

  • (a) either a library is not instrumented, so we need to start spans per operation on its behalf,
  • or (b) a library is instrumented so I pass it a context and it starts/ends spans.

I think it's important to put the full snippets here, because what seems to be a simple small difference, is actually pretty huge in terms of noise in user code:

So it is either:

// (a) the library does not start spans for the operation:
func handleSomething(buckedName: String) -> ??? {
  let span = tracer.startSpan("Segment 2", context: context)
  let x = s3/* .tracing(with: span.context) -- what would that do? */
      .listObjects(.init(bucketName: bucketName))
  x.onComplete { 
      case .failure(let error): 
          span.addError(error) // and more "status" etc
          span.end()
      case .success:     
      span.end()
  } 
  return x 
}

or

// (b) the library DOES start spans when necessary for the operation:func
handleSomething(buckedName: String, context: LoggingBaggageContextCarrier) -> ??? {  
  s3.listObjects(.init(bucketName: "myBucket2"), context: context) // emits spans
}

I'm also worried about looking at such snippets in isolation; such calls are never made in isolation but in real applications, so more like:

Lambda.run { hello, context, complete in 
  let result = handleSomething(bucketName: "bucket", context: context)
  complete(result)
}

Which looks pretty natural.

Now, yes, we have to pass around the context... but we were going to anyway most likely, be it for logging, or tracing or for "framework reasons". For the baggage context specifically, I do have hopes that whatever concurrency story we end up with, we can move away from explicit passing, to implicit passing of it (see crazy idea section above). But in today's Swift we do not have that luxury.

Yes this sounds right btw! :slight_smile:


Complication of tracing(...) vs. (context: ...)

So overall this pattern of logging(to:).thing() is quite nice :+1:

I think it's great for configuration, so things like the event loop preference, the logger as well I suppose, but it really does not fit context propagation -- context really should be passed to operations "along" the way the operations go, not via some side channel;

s3.tracing(with: span.context) // super worrying (!)
    .listObjects(.init(bucketName: "myBucket"))

I understand the wanting to tone on noise in the APIs, but (baggage / trace) context is really special and really must follow the execution paths to it's easy to mentally map spans to actual operations made. I think it would be much better if we loosen up a bit and allow optional context: if that is the main problem these patterns are trying to address. Optional context can still be cause of issues but it is much much easier to spot, and indeed it's less of a pain for those who don't need tracing... (Though I'd argue you "need" tracing when it's "too late" and you don't have time to suddenly go around and fix all your codebase).

Is the concern that we do not always have or want to pass context? Or is it about the library maintainers having to accept one?

If so, I think we can warm up to the optional context parameter... It has its own problems and risks, but it's a tradeoff. If people really do want to call APIs making distributed calls without context they could... It will break traces and cause issues, but if that'd get us one step in the door with distributed tracing in the majority of APIs that's already a good win. And we can hope to make async context a reality in the future, when the explicit context could be removed.


Why the big deal about "always" passing context, anyway?

Tracing can be expensive, it costs computation and slows down your responses, but it's also very useful.

So in reality, systems are never always traced, they are always passing context, and sometimes that context is recording a trace.

This works by the two parameters, like in Open Telemetry SDK guidelines:

  • IsRecording field of a Span . If true the current Span records tracing events (attributes, events, status, etc.), otherwise all tracing events are dropped. Users can use this property to determine if expensive trace events can be avoided. Span Processors will receive all spans with this flag set. However, Span Exporter will not receive them unless the Sampled flag was set.
  • Sampled flag in TraceFlags on SpanContext . This flag is propagated via the SpanContext to child Spans. For more details see the W3C Trace Context specification. This flag indicates that the Span has been sampled and will be exported. Span Processor and Span Exporter will receive spans with the Sampled flag set for processing.

The same is true for swift-tracing.

Now imagine this use-case:

// framework:
//   let span = startSpan(.empty) // Sampler decision to NOT trace
//   // span isRecording: false, context.sampled: false
// defer { span.end() } 
// then invokes:
handle { httpRequest, context in
  let span = tracer.startSpan("a", context: context)
  span.attributes.[...] = request // set a lot of attributes
  // all those operations are no-ops, because we're not tracing (!)
  span.end()

  something() // miss to pass context
}

func something(context: BaggageContext? = nil) { 
  let span = tracer.startSpan("something", context: context)
   // Sampler decision was TRACE (!), 
   // suddenly we're tracing from the middle of a request :-(
   // this isn't good -- we uselessly consume capacity, and the traces are 
   // "partrial" and weird -- not what we want to see (!)
  ... 
}

This generally depends on how a tracing backend is implemented... but you see the problem. Passing context even if not tracing a specific request is important.


Maybe the common ground is optional context:?

I'll be the first one to admit though, yes, passing context objects around if you're not even using tracing at all is annoying, yes.

Thus perhaps we can hit a middle ground with:

  • context parameter SHOULD be the last (before defaulted non-function) parameter, of any operation function that is either intended to be traced, or may contain (!) operations that will be
    • context MAY be optional and defaulted to nil, although this is discouraged in general
    • (users may always easily pass .empty after all)

Then note, because this rule is so mechanical...

  • we could offer a linter check for people who do use tracing. Simply find all functions where a context object is present, and if a value is not passed to it explicitly, warn about this. This will help people who are serious about using tracing :+1: We would not be able to write such linter or fixit's for xcode if the pattern is anything more fancy like the "tracing(to:)" or similar.

The optional parameter as a "MAY be optional" leaves the door open to APIs which do not want to be explicit about it (although again, it comes at a cost...).

(Please comment if these sound reasonable? )


I've gone far too long on this post, so I'll leave it at this and let's continue discussions until we're all reasonably happy with the recommendations, guidelines and types involved :slight_smile:

Sorry again that the post grew big, but due to my time away a bunch of topics accumulated here and I wanted to make sure to not miss the important bits... Thanks and I hope this is bearable to read!

8 Likes