The Context Passing Problem

Quite a lot going on here, good -- it's an important topic we need to find solutions for as a group and share some general style about them so the various libraries compose well :slight_smile:

Sorry for the delayed response, had some days off to recharge a little bit, now then, let's dive in :fish:


Protocol-based context passing / "FrameworkContext" types / protocols "carrying" (baggage) context

I agree that we'll want to allow and encourage protocols that carry around the "common stuff" :+1: There are some details we need to iron out though...

I've been calling this "framework contexts" that carry baggage context, seems the general idea we're thinking about the general same shape :slight_smile: There are some fine details we need to iron out here though.

It's somewhat a of a mix between the Go Context passing style, and a truly Swift unique style which we can do so well thanks to protocols. I hope we can figure out a pattern that'll work for all use-cases.

I think it'd be very useful to review and keep in mind the Holding baggage in framework protocols #9 protocols in baggage context itself as we discuss things going forward. If those are not enough (and I'm more than happy to discuss naming etc), we need to figure out what would be, specifically because all this context passing is relatively "simple" while there is one library and one concurrency model involved, but we don't have that luxury! We need to compose across various libraries which may or may not depend on eachother and may or may not have the same concurrency patterns. (More on that later)

Before diving into the types and relationships, let's highlight some of the pitfalls we can fall into and were brought up already:

Pitfall: passing logger separately from baggage context

The timing of https://github.com/swift-server/async-http-client/pull/227 is a bit unfortunate but in my countless discussions with Johannes on that we're been aware that the current API is not final -- we expected to change this API, of course it would be better without needing to... but here we are.

We should not just add the context parameter though IMHO. We can do better than that, by following a protocol context style, somewhat similar to what Tanner proposes, however using types that the BaggageContext project now contains since a while: ...Carrier protocol types.

First to highlight why the client.get("...", logger: myLogger, context: myBaggageContext()) may be slightly sub optimal.

The first annoying thing about it is the doubling of noise of course. Go and such languages avoid this by NOT passing the logger around, but storing the logger or creating it ad hoc, and always using it with passing the context to the logger (and through all operations/functions). It also means that there is no obvious relation between the logger and the baggage context -- that's bad because we need this to be predictable and boringly well known rather than have to look up the relationship.

Specifically, I think that an API that uses context will want to trace those identifiers as well, thus the logger must be made aware of the baggage context_. It must be done so efficiently, only when log statements are made etc.

This means the client has to always make this dance:

get(..., logger: Logger, context: BaggageContext) { 
    logger.with(context).info() // ok
    logger.warning()  // whoops! lost trace IDs from those logs...
    somewhere(logger: logger) // whoops! lost trace IDs from those logs...
}

Note that logger.with(context:) already exists - as and is backed by an efficient baggage aware LogHandler implemented in Holding baggage in framework protocols by ktoso ยท Pull Request #9 ยท slashmo/gsoc-swift-baggage-context ยท GitHub available via import BaggageLogging

Which is a bit repetitive. Also, it is my understanding that many developers want to avoid "pass both logger and the context," (please confirm or deny :wink:) thus, I agree we should do better than that and use "context protocols'.

Now, since HTTPClient cannot depend on Vapor (yet Vapor depends on HTTPClient), it must accept some context carrier protocol type, thankfully we have such type in BaggageContext:

/// A `BaggageContextLogging` purpose is to be adopted by frameworks which already provide a "FrameworkContext",
/// and to such frameworks to pass their context as `BaggageContextCarrier`.
public protocol LoggingBaggageContextCarrier: BaggageContextCarrier {
     /// The logger associated with this carrier context.
     ///
     /// It should automatically populate the loggers metadata based on the `BaggageContext` associated with this context object.
     ///
     /// ### Implementation note
     ///
     /// Libraries and/or frameworks which conform to this protocol with their "Framework Context" types,
     /// SHOULD implement this logger by wrapping the "raw" logger associated with this context with the `logger.with(BaggageContext:)` function,
     /// which efficiently handles the bridging of baggage to logging metadata values.
     ///
     /// Writes to the `logger` metadata SHOULD NOT be reflected in the `baggage`,
     /// however writes to the underlying `baggage` SHOULD be reflected in the `logger`.
     var logger: Logger { get set }
 }

See Holding baggage in framework protocols by ktoso ยท Pull Request #9 ยท slashmo/gsoc-swift-baggage-context ยท GitHub

:warning: Not in love with the names we found so far... they're a bit long, although descriptive... Open to ideas, please chime in here or in PR 9 / baggage context via issues.

so with those in place, we can in HTTPClient:

func get(_ url: URI, ..., context: LoggingBaggageContextCarrier) -> EventLoopFuture<Response> { 
  context.logger.info() // definitely also logs the trace ids etc
  // it also allows the framework context to hand tune the logger,
  // e.g. I only want to log "trace id, but not the other baggage information"
  // so the control over loggers remains in tact, but it's harder to get wrong, and it is easier to pass around

So users can pass any context there. We could also conform logger to the carrier I suppose... allowing context: logger? a bit weird but it would allow the simplest use case to work still ("just log")?

And Database can:

protocol Database: LoggingBaggageContextCarrier { ... }

and thus this will work, as you wish for in your initial example @tanner0101:

app.get("users") { req in 
    User.query(on: req.db).all() // yay
}

It also composes with other libraries, in case they accepted a raw BaggageContext or better any baggage carrier (e.g. they are not logging, but need to pass the context around for some reason), like so:

extension Request: LoggingBaggageContextCarrier {} 
extension Database: LoggingBaggageContextCarrier {} 

func first(context: BaggageContextCarrier) {}
func second(context: LoggingBaggageContextCarrier) {}

app.get("users") { request in 
  someLibrary(context: request)

Since Vapor is a strongly opinionated framework and wants to cut the parameter passing a lot, I think it's okey if it conforms all of its Request and similar types to context carrying. "Normal" less opinionated libraries I would strongly suggest to follow the general context passing guidelines:

// my library
func take(thing: Thing, context: BaggageContextCarrier) 
func take(thing: Thing, context: BaggageContextCarrier, callback: () -> ()) 

etc.

As for the Client wrapper... wouldn't Vapor be okey with not wrapping it then, if it could pass (optionally) the request context to requests? I assume you would like to move away from the wrapper, so that is a breaking change anyway, so the shape would become <???application???>.client.get(url, context: request) which would free Vapor from having to maintain those API wrappers? WDYT @tanner0101?


Let's visit another API from this thread:

First, I do realize that yes it is annoying to pass context around, no-one loves it, but that's in reality the only way we can get proper end to end tracing in Swift today. What API guarantees are you worried about upholding that adding an optional context parameter would such a nightmare?

(I'll comment below again why thread-locals are a no-go, as that was mentioned as well).

So let's look at:

    @inlinable
    public func zlexcount<Value: CustomStringConvertible>(
        of key: RedisKey,
        withValuesBetween range: (min: RedisZLexBound<Value>, max: RedisZLexBound<Value>)
    ) -> EventLoopFuture<Int> {
        let args: [RESPValue] = [
            .init(from: key),
            .init(bulk: range.min.description),
            .init(bulk: range.max.description)
        ]
        return self.send(command: "ZLEXCOUNT", with: args)
            .tryConverting()
    }

in order to have it be traced, we absolutely need to have a context available here, so it'd become:

    @inlinable
    public func zlexcount<Value: CustomStringConvertible>(
        of key: RedisKey,
        withValuesBetween range: (min: RedisZLexBound<Value>, max: RedisZLexBound<Value>),
        context: BaggageContext? // << added
    ) -> EventLoopFuture<Int> {
        let args: [RESPValue] = [
            .init(from: key),
            .init(bulk: range.min.description),
            .init(bulk: range.max.description)
        ]
        return self.send(command: "ZLEXCOUNT", with: args, context: context)
            .tryConverting()
    }

In other words, as @pokryfka mentions:

(Baggage) Context MUST be per operation. Why is that so? Because in a trace view, it is the operations we need to see and in order to be able to attach them to the right parents in a trace, we need to have that context available, because it contains all the trace parent identifiers etc.

As I'm not sure if everyone is on the same page how these traces look like, I quickly put together a Zipkin tracer over the Swift Tracing APIs, and here's how those end up looking like:

Now, we'd like Swift to give a best-in-class user experience when operating systems in the cloud, and a best in class tracing experience is a pre-requisite for that. And tracing is not only about HTTP calls, but also databases like Redis, for example, here's what we'll gain if all client libraries will participate in this context passing chain:

Screen Shot 2020-08-13 at 12.47.54

(Screenshots from Take OpenTracing for a HotROD ride | by Yuri Shkuro | OpenTracing | Medium -- great article / example app)

So right away, we'll be able to know "aha, during my /dispatch handling I'm hitting redis n times, and here's the "waterfall" and how I can improve my response latency etc. It is tremendously powerful and I hope the vision of all SSWG projects participating in this context passing dance and tracing instrumentation is a vision you can be excited about as well :slight_smile:

But yes, it does mean that we need this context parameter in all operations. I believe it is a small price to pay for the benefits and coherent ecosystem where our users will come to expect "of course it just works!".

Side note: There is also a difference between APIs making network calls (database drivers, http clients) and "just futures" in therms of "weight" and what needs to be instrumented by default, and what not;

For example, we would not want all NIO futures involved in a request to show up in a trace -- that'd be way to much information, and mostly useless information as well! However for futures, we would want to trace specific ones so eventLoop.traced(name, context).makePromise ... - is an excellent case for those "traced(...)." APIs, but for network APIs context propagation should be the default.

In any case, looping back to the APIs... similar to what @tanner0101 and @cory in the Redis PR linked are saying, the following:

public protocol RedisClient {
    var eventLoop: EventLoop { get }
    var logger: Logger { get }

    func send(
        command: String, with arguments: [RESPValue]
    ) -> EventLoopFuture<RESPValue>
    
    func setLogging(to logger: Logger) // not thread safe AFAIR?

changing into

public protocol RedisClient {
    var eventLoop: EventLoop { get }
    var logger: Logger { get }
    func logging(to logger: Logger)

    func send(
        command: String, with arguments: [RESPValue], 
        context: BaggageContextCarrier
    ) -> EventLoopFuture<RESPValue>

sounds good to me? It allows the baggage to be passed through the operations, and it allows the customization of the logger ad hoc...


:warning: Crazy future looking side-note / idea section :warning:

Now there still remains a lot to be done and Swift's concurrency story is not fleshed out yet, but we know a part of it will be asynchronous functions. We also know that we have property wrappers, and there have been ideas floating around for function wrappers as well ( Prepitch: function wrappers not official proposal, but great idea ).

So... if we have all libraries adopt the context passing, in a similar matter, we're in better position pitch an use-case of such wrappers that would be @Traced, and could look like this:

@Traced
func hello(context: BaggageContext?) async -> String { // allow optional? :<
  // magically does: 
  //   let span = tracer.startSpan("#function", context: context)
  //   defer { span.end() } 
  return await produceThatGreeting()
}

which would automatically trace the function and attach the span to the right context. But we need this context parameter around (or one of the ...Carrier types).

If/when a concurrency story lands in Swift... we could further discuss: okey we have this context that is "local" to this "task", is this a common thing in concurrent systems (I think it is), and could we put some sweet swifty sugar on top of it?

This cannot be solved well using thread locals (in-thread proposal about them, see also discussion in Server Distributed Tracing - #28 by tachyonics) due to how diverse and reactive the async landscape is in Swift (Combine, Dispatch, NIO, just pthreads, various OSS Future impls, ...), but...

Swift will have a concurrency story someday, then we would be able to formulate and express the need for "async local" (terrible working name) abstraction, and then we'll be able to replace our explicit context passing, with async local context passing.

// โš ๏ธ crazy speculation, no idea if we'd ever get async context, but!
// at least it is a very specific, existing, well defined thing we
// can _point at_ and solve, rather than wait.
@Traced
func hello() async -> String {
  // magically does:                      // AsyncLocal<BaggageContext>.value
  //   let span = tracer.startSpan("#function", context: .current) 
  //   defer { span.end() } 
  return await produceThatGreeting()
}

So async functions could carry around a "context" implicitly and we'd use this to attach baggage context there. Obviously, no idea if or when such thing would manifest in Swift, but the general idea is sound, and we are but one of the specific and widely deployed uses of it then. It is very useful for arguing such features to have an existing working ecosystem, and show a painpoint (the explicit context passing), that the compiler feature would address โ€“ my hope with this work is to do just that, prove the utility of tracing and justify the work on such async local capabilities some day.

Perhaps we should re-consider optional context parameters? This would make future migrations painless...

@Traced
func hello(context: BaggageContext? = nil) { 
  // ... tracer.startSpan(context ?? AsyncLocal context)
}

after all...
:warning: End of crazy future looking side-note / idea section :warning:


This looks slightly wrong, or rather, it looks like a "mix" of things:

  • (a) either a library is not instrumented, so we need to start spans per operation on its behalf,
  • or (b) a library is instrumented so I pass it a context and it starts/ends spans.

I think it's important to put the full snippets here, because what seems to be a simple small difference, is actually pretty huge in terms of noise in user code:

So it is either:

// (a) the library does not start spans for the operation:
func handleSomething(buckedName: String) -> ??? {
  let span = tracer.startSpan("Segment 2", context: context)
  let x = s3/* .tracing(with: span.context) -- what would that do? */
      .listObjects(.init(bucketName: bucketName))
  x.onComplete { 
      case .failure(let error): 
          span.addError(error) // and more "status" etc
          span.end()
      case .success:     
      span.end()
  } 
  return x 
}

or

// (b) the library DOES start spans when necessary for the operation:func
handleSomething(buckedName: String, context: LoggingBaggageContextCarrier) -> ??? {  
  s3.listObjects(.init(bucketName: "myBucket2"), context: context) // emits spans
}

I'm also worried about looking at such snippets in isolation; such calls are never made in isolation but in real applications, so more like:

Lambda.run { hello, context, complete in 
  let result = handleSomething(bucketName: "bucket", context: context)
  complete(result)
}

Which looks pretty natural.

Now, yes, we have to pass around the context... but we were going to anyway most likely, be it for logging, or tracing or for "framework reasons". For the baggage context specifically, I do have hopes that whatever concurrency story we end up with, we can move away from explicit passing, to implicit passing of it (see crazy idea section above). But in today's Swift we do not have that luxury.

Yes this sounds right btw! :slight_smile:


Complication of tracing(...) vs. (context: ...)

So overall this pattern of logging(to:).thing() is quite nice :+1:

I think it's great for configuration, so things like the event loop preference, the logger as well I suppose, but it really does not fit context propagation -- context really should be passed to operations "along" the way the operations go, not via some side channel;

s3.tracing(with: span.context) // super worrying (!)
    .listObjects(.init(bucketName: "myBucket"))

I understand the wanting to tone on noise in the APIs, but (baggage / trace) context is really special and really must follow the execution paths to it's easy to mentally map spans to actual operations made. I think it would be much better if we loosen up a bit and allow optional context: if that is the main problem these patterns are trying to address. Optional context can still be cause of issues but it is much much easier to spot, and indeed it's less of a pain for those who don't need tracing... (Though I'd argue you "need" tracing when it's "too late" and you don't have time to suddenly go around and fix all your codebase).

Is the concern that we do not always have or want to pass context? Or is it about the library maintainers having to accept one?

If so, I think we can warm up to the optional context parameter... It has its own problems and risks, but it's a tradeoff. If people really do want to call APIs making distributed calls without context they could... It will break traces and cause issues, but if that'd get us one step in the door with distributed tracing in the majority of APIs that's already a good win. And we can hope to make async context a reality in the future, when the explicit context could be removed.


Why the big deal about "always" passing context, anyway?

Tracing can be expensive, it costs computation and slows down your responses, but it's also very useful.

So in reality, systems are never always traced, they are always passing context, and sometimes that context is recording a trace.

This works by the two parameters, like in Open Telemetry SDK guidelines:

  • IsRecording field of a Span . If true the current Span records tracing events (attributes, events, status, etc.), otherwise all tracing events are dropped. Users can use this property to determine if expensive trace events can be avoided. Span Processors will receive all spans with this flag set. However, Span Exporter will not receive them unless the Sampled flag was set.
  • Sampled flag in TraceFlags on SpanContext . This flag is propagated via the SpanContext to child Spans. For more details see the W3C Trace Context specification. This flag indicates that the Span has been sampled and will be exported. Span Processor and Span Exporter will receive spans with the Sampled flag set for processing.

The same is true for swift-tracing.

Now imagine this use-case:

// framework:
//   let span = startSpan(.empty) // Sampler decision to NOT trace
//   // span isRecording: false, context.sampled: false
// defer { span.end() } 
// then invokes:
handle { httpRequest, context in
  let span = tracer.startSpan("a", context: context)
  span.attributes.[...] = request // set a lot of attributes
  // all those operations are no-ops, because we're not tracing (!)
  span.end()

  something() // miss to pass context
}

func something(context: BaggageContext? = nil) { 
  let span = tracer.startSpan("something", context: context)
   // Sampler decision was TRACE (!), 
   // suddenly we're tracing from the middle of a request :-(
   // this isn't good -- we uselessly consume capacity, and the traces are 
   // "partrial" and weird -- not what we want to see (!)
  ... 
}

This generally depends on how a tracing backend is implemented... but you see the problem. Passing context even if not tracing a specific request is important.


Maybe the common ground is optional context:?

I'll be the first one to admit though, yes, passing context objects around if you're not even using tracing at all is annoying, yes.

Thus perhaps we can hit a middle ground with:

  • context parameter SHOULD be the last (before defaulted non-function) parameter, of any operation function that is either intended to be traced, or may contain (!) operations that will be
    • context MAY be optional and defaulted to nil, although this is discouraged in general
    • (users may always easily pass .empty after all)

Then note, because this rule is so mechanical...

  • we could offer a linter check for people who do use tracing. Simply find all functions where a context object is present, and if a value is not passed to it explicitly, warn about this. This will help people who are serious about using tracing :+1: We would not be able to write such linter or fixit's for xcode if the pattern is anything more fancy like the "tracing(to:)" or similar.

The optional parameter as a "MAY be optional" leaves the door open to APIs which do not want to be explicit about it (although again, it comes at a cost...).

(Please comment if these sound reasonable? )


I've gone far too long on this post, so I'll leave it at this and let's continue discussions until we're all reasonably happy with the recommendations, guidelines and types involved :slight_smile:

Sorry again that the post grew big, but due to my time away a bunch of topics accumulated here and I wanted to make sure to not miss the important bits... Thanks and I hope this is bearable to read!

8 Likes

This looks great :+1: and was exactly what I was thinking.

Now this is where I get confused or perhaps disagree. Vapor's AHC wrapper currently works like this:

app.get("foo") { req in 
    req.client.get("https://vapor.codes")
}

req.client wraps the application-wide HTTPClient instance. It sends the request's EventLoop and Logger to every call to HTTPClient.execute.

Even if EventLoop, Logging, and Baggage were all in one context type, the API would suffer:

app.get("foo") { req in 
    req.client.get("https://vapor.codes", context: req)
}

Why do I need to pass req as context if I am calling this method on req in the first place?

Another way to look at this is Vapor moves context to the left instead of the right. The context is stored on the thing you call instead of passed into the call itself.

Unless AHC gives us some way to pass this context via the client instance, then we must continue to wrap it. For a concrete example, why not declare HTTPClient like so:

protocol BaggageContextCarrier {
    var baggage: BaggageContext { get }
    func using(baggage: BaggageContext) -> Self
}
protocol LoggingContextCarrier {
    var logger: Logger { get }
    func using(logger: Logger) -> Self
}
protocol LoggingBaggageContextCarrier = LoggingContextCarrier & BaggageContextCarrier

protocol HTTPClient: LoggingBaggageContextCarrier {
    func execute(request: HTTPRequest) -> EventLoopFuture<HTTPResponse>
}
extension HTTPClient {
	// Convenience API
    func get(_ url: String) -> EventLoopFuture<HTTPResponse>
    func post(_ url: String) -> EventLoopFuture<HTTPResponse>
    func patch(_ url: String) -> EventLoopFuture<HTTPResponse>
    ...
}

// actual HTTP client implementation
class HTTPConnectionManager {
	init(group: EventLoopGroup)
    func execute(
        request: HTTPRequest, 
        logger: Logger, 
        baggage: BaggageContext
    ) -> EventLoopFuture<HTTPClient>
}

extension HTTPConnectionManager {
    // use the manager to create a "context aware" client
    func client(logger: Logger = .init(), baggage: BaggageContext = .init()) -> HTTPClient {
    	HTTPConnectionManagerClient(connections: self, logger: logger, baggage: baggage)
    }
}

// this struct is essentially HTTPConnectionManager + context
private struct HTTPConnectionManagerClient: HTTPClient {
	let connections: HTTPConnectionManager
	var logger: Logger
	var baggage: BaggageContext
	
    func using(logger: Logger) -> Self {
    	var copy = self
    	copy.logger = logger
    	return copy
    }
	
    func using(baggage: BaggageContext) -> Self {
    	var copy = self
    	copy.baggage = baggage
    	return copy
    }
    
    func execute(request: HTTPRequest) -> EventLoopFuture<HTTPResponse> {
    	self.connections.execute(
    		request: request,
    		logger: logger,
    		baggage: baggage
		)
    }
}

// Usage:

let connections = HTTPConnectionManager(...)
defer { connections.shutdown() }

let client = connections.client()
// default context
client.get("https://vapor.codes")
// custom logger
client.using(logger: myLogger).get("https://vapor.codes")
// custom baggage
client.using(baggage: myBaggage).get("https://vapor.codes")

Again, this isn't something AHC needs to do. But it would allow Vapor to use the HTTPClient protocol directly.

I don't understand the difference between exposing BaggageContext via a protocol or passing into each method call. As long as its the same BaggageContext who cares how it is delivered? The only way I could see this mattering is if it were passed inout and the mutations from each call needed to be recorded in order. But this is not the case (and would also not work well with futures).

If the issue is that BaggageContext must gain additional context as it is passed down the chain, then that makes sense. That is what the using(baggage:) calls are for. In Vapor 4's ORM (Fluent) for example, the Request.logger has a database-id property added. This becomes the new contextual logger that everything lower in the chain uses meaning PostgresNIO (Postgres client package) for example includes both Vapor's request-id and Fluent's database-id in its logs.

For example, take this pseudo-code for how req.db is generated:

extension Request {
    var db: Database { self.db(nil) }

    func db(_ databaseID: DatabaseID?) -> Database {
        var logger = self.logger
        logger["database-id"] = databaseID
        return self.application.databases.get(databaseID)
            .using(eventLoop: self.eventLoop)
            .using(logger: logger)
            .using(baggage: self.baggage) // Future thing?
    }
}

For me the issue is that the context is already available on the type you are calling on. Again from my example:

req.client.get(...)

As far as I understand, req already has all the context you need. It has an EventLoop, Logger, and in the future will have BaggageContext (with trace ids deserialized from the incoming request). So can't we keep the actual call site clean?

It's possible I'm misunderstanding something fundamental about BaggageContext that requires different handling than Logger / EventLoop though. If so my whole argument falls apart.

I also want to reiterate that I'm not arguing that the way Vapor does context passing is best/only way it should be done. I think that explicit context passing is easier to understand and perhaps more appropriate for low-level libraries like AHC. My argument is only that the protocol-based pattern of context-passing that Vapor uses eliminates the need for wrapping and convenience API duplication.

As a final example of this, take the PostgresDatabase protocol in Vapor's PostgresNIO package. This is the lowest level way of interacting with Postgres. It maps 1:1 to the wire commands being serialized. The protocol looks like this:

public protocol PostgresDatabase {
    // context like EventLoop, Logger, etc
    func send(
        _ request: PostgresRequest
    ) -> EventLoopFuture<Void>

All commands are added as extensions on this protocol. Like simpleQuery, query, etc. What this means is that in Vapor, you can do:

app.get("foo") { req in
    if let psql = req.db as? PostgresDatabase {
        psql.simpleQuery("SELECT version();")
    } else { 
        fatalError("not postgres")
    }
}

The critical piece here is that the actual instance of PostgresDatabase you get is essentially a "request-specific Postgres database". In other words, it's using the request's logger and event loop (and in the future tracing) automatically. This is important:

1: A low level API from a package not depending on Vapor was used directly.
2: No explicit context passing was needed
3: Everything just worked. The best possible logger and event loop were used.

And Vapor not having to wrap means:

  • Less code duplication.
  • New features added in low-level libraries are available immediately to Vapor users.

Again, for something like AHC the code duplication is minimal. But for something like Redis or Mongo, the convenience API surface can be huge. Re-implementing / wrapping those becomes a real problem.

There are no API guarantee concerns related to adding new parameters, especially at the end (most likely) of declarations.

The maintenance burden is the same as the age old problem of supporting both variable arguments and the array syntax of the exact same method.

If I need to add one more parameter to all my methods, I have to add it to every single iteration of 200+ methods as well as updating their documentation.

This is in contrast to what Tanner has been saying of letting the object receiving the operation "provide" the context for the operations.


For the record in my refactor PR, RedisClient is changing from:

protocol RedisClient {
  var eventLoop: EventLoop { get }
  var logger: Logger { get }

  func send(command: String, with: [RESPValue]) -> EventLoopFuture
  func setLogging(to: Logger) // yes, this is not thread safe
}

to

protocol RedisClient {
  var eventLoop: EventLoop { get }

  func send(command: String, with: [RESPValue]) -> EventLoopFuture
  func logging(to: Logger) -> RedisClient // This will create a new operation specific context provider
}

So it matches the pattern that's being established (and seemingly approved of in this thread)

let connection = RedisConnection.connect(/* ... */)
defer { _ = try? connection.close().wait() }

let results = connection
  .logging(to: myCustomLogger)
  .get("foo") // any logs generated in send(command:with:) will use 'myCustomLogger'
  .and(connection.get("bar")) // any logs generated in send(command:with) will use the "default" connection logger

Thanks a lot Tanner & Nathan for giving this some thought! Let's unroll and deep dive into the differences here.


Before we dive in, allow me to share some context in which we're operating, as it shapes a lot of the needs and tradeoffs we need to take when designing these APIs.

Very roughly, one can claim that there are two "kinds" of users (and everyone somewhere on a line between those extremes), which shapes what requirements they pose to their tools and frameworks:

  • [a] teams with a simple not-very-distributed system
    • it may have one or two databases, caches, maybe it talks to one or two other services in the same company; it's still "fits in your head" and maybe you even know teams which operate the other services
    • (though note: all systems start small, and some of them then inevitably become more complex large distributed systems.
      • We want to avoid "trapping" users in situations where they and their frameworks can't "grow" with them (i.e. let's avoid having people hit a wall and go off writing "we had to rewrite our system from Swift to X because the tools did not scale as our needs grew!", let's avoid these by planning forward)
  • [b] teams operating in large organizations, interacting with tens or hundreds of other services
    • in such teams/organizations, it may even be a requirement that a language/runtime has a great observability (meaning: logging + metrics + tracing) story, otherwise it'll be dismissed as cute toy but not "serious" for the backend needs of such organizations.
    • for Swift to break through and be taken seriously in such organizations, the core SSWG APIs such as Logging, Metrics and Tracing (often called the "Three Pillars of Observability") are crucial, and already with Logging with Metrics we saw the situation improve by a lot.
      • Now we need to take the 3rd step here to offer a complete picture to serious large organizations evaluating if they should or shouldn't trust Swift on their backends. In other words, this is about answering the infamous "is it ready for production yet?" with a resounding yes! when the organization asking has very strong requirements on operating their services.

So today, Swift is already doing great with [a]-style use-cases on the server! And this is very much thanks to all the libraries/frameworks people in this thread have developed - this is no small feat and we should keep and carter to this audience.

But the [b]-style use-cases still have a lot of nitpicks they can point at and dismiss Swift without seriously evaluating it; Tracing is one of those boxes that we can not only deploy "somewhat", but do it truely great which will cause such teams looking at Swift see it in a much better light, even as an improvement over what they used to use for their backends. This is what we need to aim for here.

These two "styles" are obviously somewhat conflicting: [a] want the least amount of noise and friction possible, they may not care much about all these things which matter only in larger deployments, yet we also need to grow server side swift in those large organizations / systems [b] which have more requirements, and also a slightly larger tolerance for some noise to get the benefits they care about (of course no noise would be best, but we can't get there in one magical step).

So, how do we make APIs that can serve both types of users, and grow the impact Swift has on the server?


Why context is not configuration, and should not be hidden away

Let's use the most recent example of the "hide context in client protocol" style that Nathan provided:

// what-if (this API shape has issues/pitfalls)
protocol RedisClient {
  var eventLoop: EventLoop { get }

  func send(command: String, with: [RESPValue]) -> EventLoopFuture

  func logging(to: Logger) -> RedisClient
  func with(context: BaggageContextCarrier) -> RedisClient
}

This is easy but not simple, it's easy for the library maintainance to ask users always do the right thing, but it invites a few kinds of programmer errors:

First, storing a client with context:

// don't do this
var redis: RedisClient
func something(context: BaggageContext) -> Future<...> {
  redis = redis.with(context) // oh no!

  // some many redis operations
  ... = redis.something(...)
  ... = redis.something(...) // maybe ok, maybe broken
}

func other(context: BaggageContext) -> Future<...> { 
  // <100 lines of code here>, 
  // because that's how real world projects get messy after a long time...
  redis.get("foo") // whoops!
}

So I have a few redis operations here, and I got lazy and decided to "oh well, it's all in the same context anyway!" and I can accidentally cause such mistake. So let's say the code in something() maybe even still is correct, but I stored the client somewhere and in other() (maybe because some refactoring, merge conflict gone wrong etc) I missed to do this and now it not only has no trace but it has a completely unrelated trace :scream: Such things are very painful to discover when "production is on fire", and that is exactly when we look at those traces/metrics.

One can always say "don't do that", but it is a good APIs job to lead to the right out-comes and prevent abuse and mistakes. We'll discuss below also the linter situation -- we can't write good linters for this pattern making it more difficult to recommend to larger teams.

:warning: client protocol style: this pattern makes it possible to make silly, hard to "visually" (or automatically) track down mistakes;

This problem is not new to Swift, albeit with more fatal consequences, the same manifests with getting pointers to underlying storage. It should not be valid to just "get the pointer out and store it," in the same way it should not be possible to store a tracing context. This however would yield a "meh" API, such as this:

let res: Res = client.with(context: context) { client in 
  client.operation(...) -> Res
}

// in the same style / for the same reason as:
// array.withUnsafeBytes { pointer ... } 

this is "better" because it stops me from doing the storing the wrong context mistake, but since context should be passed "in a lot of places" this is obviously too annoying of an API. So I don't think this is viable.

:warning: client protocol style: Trying to use the protocol as a with() { ... } would prevent the bugs of storing contexts, but is quite annoying both for API designers and end-users.

:white_check_mark: explicit context style: it easy to express the rule of "don't store context objects, just pass them along," and as we'll see below, we can write linters to help with this.

To see how really this pattern is more complex and makes some automation and tooling impossible, let's look at function wrappers as well as tooling and linters below:

Simplicity and "Function wrappers"

While function wrappers have not been proposed "seriously", it is not unreasonable to speculate they would be a very useful and natural addition to Swift. And since we're trying to form a standard way to pass context around for instrumentation/tracing, and such annotations would be tremendously useful to these use cases, we should make sure we're future-proof for those.

Today, in order to trace executions one has to do "manual weaving" of managing spans, such as:

func send(work: Work, context: BaggageContextCarrier) -> EventLoopFuture<> { 
  let span = tracer.startSpan("#function-\(work.type)", context: context)
  
  let someFuture = ...
    .always { result in
      if case .failure(let error) = result {
        span.addError(error)
      }
      span.end()
    }
  return someFuture
}

This is a bit annoying, and while it is fine for "that one single send() function in my entire library" this is not something one would like to sprinkle around one's codebase everywhere when I'd like to trace some time a piece of code takes. (Sure, we can work around this with sugar on top of futures and with... functions there etc, but that's not the end-game because we will have asynchronous functions).

Specifically, in Redis client and other databases there usually is exactly one (or very few) function which handle the "send the query" and then a lot of convenience API on top of them. This dance today only has to be performed once, in that send function.

What we'd want to be able to do though, is the following:

@Traced
func send(work: Work, context: BaggageContextCarrier) -> EventLoopFuture<> { 
  return ...
}

much nicer...! And in the future, the same composes just as perfectly with async functions:

@Traced
func send(work: Work, context: BaggageContextCarrier) async -> Thing { 
  return ...
}

Now, we could not nicely express this if the context is hidden away in library specific types, since each library has its own "client" type. Trying to express this with "the thing I call has to conform to some protocol on which I can set a context before I call it" becomes all kinds of weird and not understandable IMO. (I tried to PoC this a bit but it looks terrible each time).

:warning: Hide context in client-protocol: It is impossible to implement @Traced if we hide away the context parameter from the function.

:white_check_mark: Explicit context (even if optional): It will be possible to implement @Traced naturally if/when some form of function wrappers were to land in the language.

So yes, it is natural to express the context to be "flowing with the calls" (same as a stacktrace isn't a thing "on the side" a trace is not a thing on the side after all -- it is the lineage of the execution). but we still have that pesky explicit parameter that people don't like...

Bear with me and read on, we'll get rid of the explicit context parameter it in the future (!).

Simplicity and Tooling

Keeping context passing Simple is very powerful, not only to avoid "having to remember which special DSL I need to call here" but also because it enables tooling.

I mentioned this above a little bit, but let's dive into it deeper. Funnily enough I had forgotten this is also a reason fow why Go's context wroks like it does (and here I thought I invented the reason, silly me, of course not -- it's been done before :wink:):

Note:

// context package - context - Go Packages
Do not store Contexts inside a struct type; instead, pass a Context explicitly to each function that needs it. The Context should be the first parameter, typically named ctx:

(We're not doing "first" but "last", after consultation, sswg and nio teams and the std-lib team and just general swift function naming guidelines; but the general idea remains similar)

And notably:

// context package - context - Go Packages
Programs that use Contexts should follow these rules to keep interfaces consistent across packages and enable static analysis tools to check context propagation [...]

Here is what this specifically means for us. The following "hidden via protocol" pattern:

client.with(context: context).hello()

client.helloAgain() // โš ๏ธ it... *probably* does not have the context?
someClient.hello() // โš ๏ธ does it have the context or not? we *can't* know!

How could we decide where to issue warnings with a linter or even xcode itself? It becomes impossible to statically definitely decide if a call has the right context or not :warning:

The following however is trivially machine decide-able:

// hello(<other-parameters>, context: BaggageContextCarrier?)
client.hello(one, two, three) // ๐Ÿ’ก missing context pass spotted!

:bulb:Note: We can say we'd do optional context if that to ease the adoption adoption and allow people who don't care about traces to ignore the context dance. This is a tradeoff, but one we can make to ease adoption and future evolution IMO.

This allows [a] style of users to not care about it, and [b] style of users can use a linter that we can build already today using SwiftLint's analyze mode, that would detect missing context passes easily. It can be warnings or errors and thus it can be tuned to what a team needs. Say a team really wants to make sure tracing works because they have a complex system, and need all the insight they can get during an outage or performance tuning, they can make a not passed context fail to compile.

We can also provide fixits:

client.hello(name: "Charlie", option: .something)
// vvvvvvvvv
client.hello(name: charlien, option: .something, context: context)

* not a real fixit in Xcode :wink: just an image to keep your attention :sparkles:

This is why the concistent naming of the context parameter (if it's BaggageContext instance, or FrameworkContext that carries a context) always as "context" is important. We can easily fix things and it will be a smooth ride to add propagation when needed.

:warning: hiding context in client protocols: impossible to write linters and fixits for teams which care and want to always ensure the highest standard and quality of their traces.

:white_check_mark: explicit context passing: It is possible and relatively simple to build tooling, linters, and fixits that would fill in the missing parameter.

What we'd really want: Asynchronous Contexts / "Execution Scoped Variables"

Let's also keep reminding ourselves that this isn't about the Swift today but where Swift is heading. And while John has not yet published his concurrency proposal, it is fair to say it involves some form of awaiting and coroutines (although not just that).

So these APIs need to work well with async/await style and the arrival of these features means having to change all signatures and patterns anyway, e.g. because you may want to avoid returning EventLoopFutures perhaps, i.e.

func hello(context: BaggageContext?) -> EventLoopFuture<Hello> {} 
// vvvvvvv
func hello(context: BaggageContext?) async -> Hello {} 

Then, it is only natural to want to express "some data that flows with the execution" along with async tasks and functions. This would be "Async Context" (or similar, other languages call these Scope Variables), and if we'd get to that point we can remove the explicit passing around but semantically it is exactly the same and migration will be simple:

@deprecated(explicit passing is not necessary anymore)
func hello(context: BaggageContext?) -> EventLoopFuture<Hello> {} 

// In Swift "X"
func hello() -> EventLoopFuture<Hello> {
  BaggageContext.current // which would use AsyncContext<BaggageContext>.get
} 

:bulb: this looks like but is not thread local variables. Thread locals cannot work well in Swift with its various executors and concurrency libraries; The use of thread locals to pass around context should not be part of the core story; though if you are some single threaded framework... you could get away with it -- but really, it's 2020 and being async and non-blocking is the norm, and the same will be true to distributed tracing soon enough.

In any case, we don't know when or if we'll get this, so it does not make sense to hold our horses until this feature lands, but we should make the evolution towards what we really want a stright one, and then allow the language to help us removing more and more noise from it.

:white_check_mark: explicit context passing: is a natural "predecessor" to asynchronous, implicit, context propagation that could be offered by a coroutine runtime, which has proven useful in existing coroutine implementations, and as such it is not unreasonable to expect such feature to exist someday.

More reading on this topic:

  • A good read on CoroutineContext and Scope (Scope.coroutineContext) in Kotlin: Coroutine Context and Scope :eyes: highly recommended to read this
    • this specifically applies a lot to us, as concurrency's fundamental building block basically known to be coroutines by now; Note how a coroutine context can have values added to it, and retrieved from by type; this is the same how we'd carry a baggage in such "coroutine context" / "async context"
  • Java's pursuit of Scope Variables in project loom (coroutines)

A few specific ways forward to consider

As I see it, we have a few "ways" forward, and I'll list them below explicitly, and we have to keep in mind the upcoming "Swift Concurrency" story, so things will change, so the various ways to can phrase the APIs can either prepare for migration to that future concurrency, or we can wait it out, and do nothing until the new things land perhaps (but I think it'd be a waste to completely punt/wait until then).

Let's focus on the concerns you mention first though:

I understand your desire to keep the context out of here, I really do.

But looking at the examples above, I really think we're trying to "eat the care and keep the cake" if you want to provide a super fluent DSL that strongly relies on Vapor style / protocol style.

I really do understand your desire to keep the APIs as lean as possible, this is what Vapor promises to its users, but I do not think (see examples above) this is the right style for context passing and it is counter to how asynchronous and "execution scoped variables" (which is what (baggage) context is) really work. I fear that the more we try to hide it, the more pain you'll be inflicting on future adoption, unless holding out i-don't-know-how-many-years until async contexts become a thing.

Sadly I don't know if or when those would land, because that would solve the entire "explicit" problem of the explicit context passing, but I don't think we should adopt patterns which hide what we're really doing, as it will only make composition, tooling, and future evolution towards async functions (and more) very difficult.

What is wrong with not wrapping all those clients and saying the same everywhere, Vapor need not be in the business of solving every possible http / database / service client's APIs with a super fluent DSL -- the goal is consistency, not short-function-calls-regardless-of-complexity-cost-to-get-there?

If one want traces, pass the context + it will prepare us well for the future asynchronous swift world. I can't make promises about the future, but the trajectory of such APIs is consistent and in alignment with how coroutines and asynchronous contexts usually work. :pray:

9 Likes

Hi all,

first of all: Thank you @ktoso for those vast and hugely informative reads. They are long, but I'm very thankful for the depth in which you build your argument. :+1:

Having had the experience with a 100% traced Go backend, I'm totally on board with the story on explicit context passing. Especially the linter story you outline is a huge benefit IMHO. As Swift has decided to keep the context parameter always at the end, this becomes especially relevant, since it is mentally way more complex in code reviews to check for a context at the end, compared to a context as the first argument.

+1 As of today I think that the biggest pool of potential server side swift developers are iOS developers that want to get their feet wet in the backend. Taking stuff away that is not 100% necessary in the beginning eases the learning curve and drives adoption. If a developer/team decides to add tracing at a later point it can be added quite easily thanks to a potential aggressive linter as you outlined.

2 Likes

Right yeah! Thanks for making that point explicitly. Of course we would not want to alienate or make it much harder to get started with the server side for folks coming from other Apple platforms (kind of my "[a]" type use-cases). While at the same time we can become better for "[b]" type server developers coming to swift who'd be missing these things -- with this road forward I think either can tune the way they want to use server side swift to what they really care about :slight_smile:

I think there's an important part of this discussion being missed that I want to highlight, because I feel like it's not being heard clearly. When @mordil says that he's concerned about adding a parameter to 200+ methods, and when I say that a sufficient case has not yet been made for API stability breakage on AHC, your response so far has been of the form:

I think you have not meaningfully tackled the objection being raised. So let me try to clarify it a bit in the hope that you'll more clearly address it.

Firstly, the common ground: I agree with you that optional context is not a good design. I agree that it's too easy to fail to propagate the context in the case where the parameter is defaulted. Users should be encouraged to make choices in order to improve the ability of developers to audit the code and understand the context passing requirements. I personally am convinced that the right design is to add context, non-defaulted, to all relevant invocation methods.

Where we disagree is that I don't think anything I just said in the above paragraph is sufficient to justify the addition of "and therefore we must break the API". Let me bring in another principle of mine: as a general rule, the less frequently you can break your APIs, the better. API breaks are occasionally necessary to clear away cruft and to embrace new features more fully, but the more users you have the more painful each break gets. The result of this is that there is positive value in preserving your major version number where you can, and cost in revving it. This doesn't mean we shouldn't break API, but it does mean that a strong, affirmative case has to be made that doing so is important and right.

For AHC, I am extremely confident that such a case hasn't been made. This is because for AHC we can much more effectively do a "deprecate-and-replace" dance. In this case, we'd keep the old code around, but deprecate it in favour of the new one that passes a concrete context type. These warnings will discourage new code from using the old API, and will also positively encourage newer users to adopt the preferred API, but will continue to allow users to have access to those older APIs. This opens the door to those users remaining supported, getting security fixes and performance improvements, and whatever else, without forcing them into a churn cycle or causing dependency hell in the dependency graph.

For RediStack, things are a bit trickier. As @mordil has noted, RediStack requires substantial work to add new parameters. I still think on balance that RediStack should consider adopting a deprecate-and-replace dance for this change, but it may be worth considering whether an alternative approach for RediStack is worth it to avoid revving to 2.0.

All of this context work is really important, and I'm glad the community is tackling it. I just want to be counselling caution on revving major version numbers where we don't have to.

2 Likes

Thanks Cory for reminding / bringing back focus to the how do we get there, now that it seems we can (hopefully) agree that's where we want to get to.

You're right I didn't do a great job at addressing the compatibility/migration aspect and focused on the why and where-to so far... Let's fix that! I'll follow up shortly with a non-breaking migration strategy for AHC and RediStack as case studies. (For RediStack I believe that with some tooling help we can build this should be possible without making it a huge maintanance burden for @Mordil).

@ktoso thank you for the detailed response. I feel you understood and represented my points well!

This in particular resonated the most with me. I disagree with some of your arguments (mostly that explicit context passing is objectively less bug prone than protocol-based context passing), but overall I agree being explicit is simpler and more common. If that means we can coalesce the entire community behind one way of doing things that's a huge win. Especially if that means we can then collectively ask Swift and our tooling to make using this pattern easier.

This gave me a lot of good food for thought. I'll take some time to digest this further and think about what this would mean for Vapor.

+1 to this. Making the context non-optional is not critical enough to merit a major break. Something else will crop up eventually that requires one such as NIO 3, async await, etc. At that point we can easily drop the deprecated methods and fully require context.

3 Likes

Thanks @tanner0101, glad these helped wrap some heads around the tradeoffs.

Now to outline some (potential) the adoption plans then:

Case: HTTP Client

The primary "glue" between many systems is HTTP calls, as such, instrumenting the HTTP Client takes highest priority (followed by other RPC systems, e.g. gRPC), then followed by "leafs" :leaves:.

The HTTP Client can gain additional overloads accepting context parameters, and use those to instrument outgoing calls, automatically making Spans and propagating baggage context (incl. tracing headers).

It's current API shape is roughly:

public func get(url: String, deadline: NIODeadline? = nil) -> EventLoopFuture<Response> {
    return self.get(url: url, deadline: deadline, logger: HTTPClient.loggingDisabled)
}

public func get(url: String, deadline: NIODeadline? = nil, logger: Logger) -> EventLoopFuture<Response> {
    return self.execute(.GET, url: url, deadline: deadline, logger: logger)
}

overloaded for get/post/patch/delete and execute calls.

The non breaking addition we propose to add is the following:

// addition:
public func get(
    url: String, 
    deadline: NIODeadline? = nil, 
    context: LoggingBaggageContextCarrier // new
  ) -> EventLoopFuture<Response> {
    return self.execute(.GET, url: url, deadline: deadline, context: context)
}

and the existing overloads to be expressed as:

// change to delegage:
public func get(url: String, deadline: NIODeadline? = nil) -> EventLoopFuture<Response> {
    return self.get(
        url: url, deadline: deadline, 
        context: HTTPClient.emptyContext
    )
}

// change to delegage:
public func get(url: String, deadline: NIODeadline? = nil, logger: Logger) -> EventLoopFuture<Response> {
    return self.execute(
        .GET, url: url, deadline: deadline, 
        context: DefaultBaggageLoggingContext(logger: logger, context: .empty)
    )
}

There are other parameters used in execute, such as:

// addition:
    public func execute<Delegate: HTTPClientResponseDelegate>(
        request: Request,
        delegate: Delegate,
        eventLoop eventLoopPreference: EventLoopPreference,
        context: LoggingBaggageContextCarrier, // new
        deadline: NIODeadline? = nil,
    ) -> Task<Delegate.Response> {

Focusing on the "miscellaneous" parameters (so except request/delegate, which are core part of the API and the "main" parameters really), we'd have:

  • event loop - preference is not defaulted in this call, thus proceeds context
  • context - is "last parameter in set of non-defaulted non-function parameters", thus it'd be added here
  • deadline is optional, so we'd keep it at the end, following swift style guidelines on defaulted params after non-defaulted

Again, delegation to this function from the previous overload would be done by an "empty logging context".

As a reminder LoggingBaggageContextCarrier contains both a logger and a context, and the logger uses that context when logging as well (thus logs remain nicely corelated).

Composition

With "any" other library

With other APIs/libraries/frameworks can be done via passing other context objects as last parameter:

http.get(url: "", deadline: deadline, context: some context)

Interesting side-discussion... The currently used deadline: NIODeadline is the "local deadline/timeout" but there is another type of deadline -- a distributed deadline (!) . For this reason I think it is fine to keep the existing APIs to pass deadline as-is, and IF someone wanted to adopt distributed deadlines (meaning "if you're a server, and it's like a second after that distributed deadline already, don't even bother processing this request").

Composition with Vapor (today / options)

Vapor 4 today, already has a wrapper type for HTTPClient. We're aware this is sub-optimal and would like go avoid having it, but it's there today.

Since the wrapper exists today it could be used to delegate through to the actual client passing the context: self.request if it wanted to raise the required AsyncHTTPClient version. Though this would embrace the wrapper style more rather than less. It is a road that could be taken, up to Tanner and team if that's worth it.

Alternatively, or "the new way" the http client can be used as:

app.get("users") { req in 
req.(client (actual HTTPClient type, not wrapper))
  .get(url: ..., context: request)

if request conformed to LoggingBaggageContextCarrier which is should be able to.

This means moving away from providing those wrapper types, which I think is what @tanner0101 was hoping to get out of the business of doing (wrapping all clients).

Note that this is not new API (the get) in Vapor, but simply the surfaced new http client, which CAN accept a request if needed to.

We'll discuss below (after redis discussion) what / when about adoption and deprecations, i.e.:

  • should the wrapper be deprecated? (up to Vapor community to decide that)
    • probably it can be deprecated slowly, only once next Vapor major release lands, and we have concurrency in swift, then there's "good reasons to move people off the other API asap" I suppose? It's up to the Vapor community though.

Case: RediStack

Redis clients, including RediStack have a slightly tougher time adopting additional parameters without source-breaking because of the sheer amount of the commands Redis exposes.

Sidenote: Some clients I'm aware of use source generation from https://github.com/redis/redis-doc/blob/master/commands.json with an "overlay" which contains all extra information like docs or extra type and snippets to use in code generation. This allows to lessen the maintainance burden of having multiple command overloads, as they're all generated anyway.

Just throwing it out there as an idea that we've seen this done; though RediStack is not doing this, thus adding parameters is painful and multiplies API surface maintanance burden.

Okey, but let's look at the specific APIs and implementation we got. So there's a lot of functions that roughly follow a similar pattern:

    /// ... docs ... 
    @inlinable
    public func setnx<Value: RESPValueConvertible>(
        _ key: RedisKey, 
        to value: Value
    ) -> EventLoopFuture<Bool> {
        let args: [RESPValue] = [
            .init(from: key),
            value.convertedToRESPValue()
        ]
	
        return self.send(command: "SETNX", with: args)
            .map(to: Int.self)
            .map { $0 == 1 }
    }

We'd need to add the following overload:

    /// ... docs ... 
    @inlinable
    public func setnx<Value: RESPValueConvertible>(
        _ key: RedisKey, 
        to value: Value,
        context: BaggageContextCarrier
    ) -> EventLoopFuture<Bool> {
        let args: [RESPValue] = [
            .init(from: key),
            value.convertedToRESPValue()
        ]
	
        return self.send(command: "SETNX", with: args, context: context)
            .map(to: Int.self)
            .map { $0 == 1 }
    }

These two only differ in the presence of the context parameter. Indeed, mirroring all existing APIs would be a pain.

My assumptions:

  • would not want to / can not add optional context parameter?
    • the version of RediStack is 1.0.0-alpha.11 so I'm not sure if it promised stronger source compat guarantees than the version implies...? Just to call it out โ€“ so we specifically don't want to solve this via an optional parameter it seems, right?
  • mirroring all API by hand is not workable / maintainance burden
  • RediStack is interested in participating in tracing -- it would be great since we know of teams using it which would be happy if it did so.

Proposal: "Compat shim" source generation

As @Mordil said, the same pain point comes up in multiple areas in Swift, e.g. wanting to support a var-args API mirroring another, or now needing to add a context param etc.

I've had some good experiences with swift-syntax this year using it for use-cases very similar to this and I could offer help building a small tool which would do the following:

    /// ... docs ... 
    // @genCompatibilityShim(skipParam:context) <<<<<
    @inlinable
    public func setnx<Value: RESPValueConvertible>(
        _ key: RedisKey, 
        to value: Value,
        context: BaggageContextCarrier
    ) -> EventLoopFuture<Bool> {

which is easy to pick up using swift-syntax:

final class GatherActorables: SyntaxVisitor {
    // ... 

    override func visit(_ node: FunctionDeclSyntax) -> SyntaxVisitorContinueKind {
        // FIXME: just check the comments of the node, not the entire node
        guard "\(node)".contains("@genCompatibilityShim") else { 
            return .skipChildren
        }

        // TODO: parse the mini-dsl "aha, ommit the context param, okey"
       // TODO: collect all those nodes and their shim DSL information
    }
}

from such collected information, emitting a slightly transformed node, i.e. "remove that context parameter" and delegate to the real one from e.g. RediStack/Commands/HashCommands.swift to RediStack/Commands/HashCommands+CompatibilityShims.swift. Note that this can rather easily also mirror documentation comments, which is a nice win, so they'd remain in sync etc.

I have worked with these things before in similar fashion and am more than happy to help out making such tool, especially since it might come in handy with all kinds of compatibility shim situations which otherwise would be too painful to do manually.

Are my assumptions in this section correct and do you think this would help RediStack adopt context propagation?

Note that the context is just new functions, and the old ones can remain as-is if you do not want to "jump ship" right away and we can also take it easy / slow with deprecating anything -- your call how fast or slow to adopt these.

When to deprecate?

Then there's the discussion about when to deprecate the non-context requiring functions and honest I think that's really up to each library to decide... In HTTPClient I believe we should deprecate them, and I think that's what @lukasa is saying as well; though it'd remain source compatible; If this is too early to do so however -- then let's not, and we can do this later. There's no need to completely rush this.

And as mentioned before, we could also offer those linter checks if APIs need to remain using the non-context-requiring selfes for a longer period of time. I'm also happy to take on or help with this work potentially.

Swift Concurrency Considerations

We already briefly mentioned that some form of "coroutine context" or "async context" may be something that happens at some point (no guarantees!).

However, even if, it _may not necessarily address context/trace propagation in non-async functions / non async contexts (!)__, thus we still need a way to deal with "oh no, I can't use the automatic async-task-local language provided context propagation here" so in those cases we'd still want to be able to jump to the common pattern of just passing it around.

I think embracing the explicit passing right now, is not a roadblock for adopting the language features if and when they happen; Kind of as Tanner highlights above here, it's a stepping stone.

Note also, the language features if/when would come with major language changes and the swift concurrency model, and at that time, a lot of APIs would need to make more or less drastic adjustments. So it's fair to adopt explicit passing right now, and if we'd get some features, we can then deprecate the explicit functions if people really hate them :upside_down_face:

Timing...

I think we perhaps have become over-excited here, and I'd like to mention that other than the HTTPClient I don't think we're necessarily in a rush here. Teams using raw NIO and HTTPClients (and gRPC) will be able to slowly ramp up and start using this, and the more they do, the more they'd like other parts of the system to trace as well. (I do admit we do think mostly about supporting internal teams when I say this - however those teams also love the SSWG OSS projects, so there's a general direction and wish that the SSWG ecosystem libraries eventually become able to participate in tracing - but if it's not right now, that's okey.)

We're still wrangling with some details of the Tracing API as @slashmo is finishing his GSoC this month, but we'll continue work on this and move towards a stable reliable API everyone can rely on.

I would like to keep asking for feedback though, also on the Tracing APIs! The ongoing PoC embedding it in Lambda https://github.com/swift-server/swift-aws-lambda-runtime/pull/162 is an useful proving ground where we can see it "in action" in a more complex runtime than the http client. So any feedback, also from adopting these APIs in your frameworks would be very useful โ€“ while we're slowly getting to locking them down. :slight_smile:

2 Likes

@ktoso This has been a fantastic and well-thought out reply that has changed my opinion on a few points.

I'm going to think about this throughout this week and make sure to reply by Friday on some thoughts on how to move forward.

2 Likes

For the record:

aws-sdk-swift does generate (hundreds of) operations using a template, see https://github.com/swift-aws/aws-sdk-swift/blob/master/CodeGenerator/Templates/api.stencil

@adam-fowler

(not sure if it makes sense but) technically it would be relatively easy to have two "flavors" in fully automated way:

A tag on master branch, which prioritizes API compatibility, could trigger a workflow which:

  • pulls the code
  • generates API using different template (which requires context parameter in API calls)
  • runs tests
  • checks in changes (on another branch)
  • tags it using the original tag and a suffix

User could choose if to import v5.0.0 if he'd rather not break existing code
or v5.0.0-EXPERIMENTAL_CONTEXT to make sure context propagation is enforced at compilation time.

During next major (API breaking) release it could be decided to choose just one approach.

1 Like

AWSSDKSwift does generate all it's API entry points using a template. You could create a second branch and set of tags in an automated way. It provides a slightly confused story for users of the Package though. I would rather there was one HEAD version of the API. Also enforcing a non-optional context would most likely propogate changes down into the core library aws-sdk-swift-core which would also need an EXPERIMENTAL_CONTEXT branch. The update of this would be a great deal harder to automate and would need some manual maintainance.

We are nearing the end of the alpha stage of 5.0.0 aws-sdk-swift and are still in a phase where are we able to make breaking changes at the moment. I have already spoken with @ktoso about timing of releases and will ensure we have some tracing support for the final release of 5.0.0. I'm ready to hold off until the official release of swift-trace.

I believe for the moment we should go with an optional BaggageContextContainer and rely on the linter functionality @ktoso has previously mentioned. It doesn't required immediate adoption of tracing. In addition AWSSDKSwift can be used on the iOS client and requiring tracing there doesn't really make sense.

This thread has very much moved away from discussing the other context variables like EventLoop and Logger. I'm currently debating on what I should be doing with those as well. Do people think a mixture of explicit/implicit methods make sense, or should we be passing everything directly. I'm concerned the mixture of methods will look a little confused.

let loggingS3 = s3.logging(to: logger)
loggingS3.listObjects(.init(bucketName: "confused"), context: baggage)

But passing everything directly means we get back to the issue @tanner0101 was originally trying to resolve.

1 Like

Thanks Adam!

On the other hand, what if offering overloads with passing context, i.e. having it not required to be passed. I recently talked with @tomerd about this and we drew a distinction between "very glue" libraries such as http clients, http servers and RPC systems (e.g. gRPC), and "leafs" -- in some ways AWSSDK is more of a leaf than a "glue" if you know what I mean. So maybe an optional context carrier parameter would be a good way to adoption? It would not be source breaking; and if you're generating functions anyway, it seems (please correct me if I got that wrong) the AWSSDK can more easily decide "let's add another overload" whenever ready and that would not be breaking?

Yeah I think that's right. I agree we don't need to force it "hard" on all libraries (exceptions being: "glue between services" http/grpc/rpc). and an optional carrier should do the job well here.

We discussed some more with @tanner0101 as well, which resulted in making these tickets you all may be very interested in and may want to voice some ideas in:

  • Design composable Carrier protocols #22 which extends the current notion of Carriers (or rather, should we extend them or not)
    • I'm a bit worried about this one, but that's why the ticket, to dive deep into it and also address the event loop preference question; let's figure it out either in this forums thread or that github issue.
  • and Revisiting naming #23 since we're not in love with the long names the carriers produce and there are a lot of libraries that already use the word Context I noted some of the naming inspirations and worries there; It's something we should discuss before locking in the baggage context APIs.

Would love to hear everyone's thoughts there; As a heads up, I think I may have to be a bit slow to reply to those for a while as I need to focus on something else for a week or two. But timing wise I think things should align...

As for importance: it is far more important to lock down the baggage context API (naming and carriers) than the tracing APIs per-se, because tracing and also OpenTelemetry which we're aiming to provide a compatibility layer for are still a very aggresively evolving space "globally" (in all standards/tracers) while baggage does show visibily in user APIs (and tracing does not, it is internal to adopting implementations).

I think that's a great time to loop back to this and I'd love to discuss those more. The ticket Design composable Carrier protocols #22 would be what we'd like to resolve by these discussions. So... do we need a few "well known carriers" and libs can use them? Or is it too much of a risk with future evolution or not etc.

Personally carrying a logger with a context is very natural, so context: thingThatHasLoggerAndBaggage is definitely part of the plan -- since the logger basically always would want to also be able to reach to baggage to log trace IDs etc. There's open questions about the event loop preference.

Thanks a lot for keeping the conversation going! :pray: Let's slowly but surely keep thinking about those; as mentioned before, I think the most important things are: the baggage API so we don't have to rename anything after we declare it's good, adoption (non-breaking) in http client and grpc, and then whenever a library is ready and wants to participate other leaf libraries. I agree that slowly rolling out and having the carriers/context optional is a good way to adopt this by many libraries, so let's take it easy and adopt without breaking APIs wherever possible :+1:

OK, it's been a very long and busy week so it took much longer than expected to answer here.

I've been convinced of the value of not only the context passing, but the explicit context passing. Enough to sign on to the maintenance burden that I have a feeling this might cause.

Here is my plan for RediStack:

  1. I'm going to ship 1.0.0 with the protocol-based "wrapping" that Tanner described and I've currently designed around.
    • I'm going to refine my current refactor branch to go deeper into the stack, primarily the connection pool to provide as many "context specific" logs as possible
  2. Once the baggage API has become stable, I will redesign around it and explicit context passing for a 2.0.0 release (I need to cut one anyway for some Redis 6 stuff)
3 Likes

That's great news - thank you, Nathan! :+1:

Your roadmap also sounds very good :slight_smile: This way we can take the time to polish the APIs and you can confidently adopt it when ready. Thanks again for spending the time to dive into this :+1:

Hello again everyone,
I would like to request everyone's input on what I perceive a final proposal of the API shapes: https://github.com/slashmo/gsoc-swift-baggage-context/pull/34

The general "...carrier" and "anyone can add any carrier protocol they want" idea did not fly, as it's "anyone/anything" part made compositionally fall flat on its face. The more I worked on this the more I kept going back to the core idea we need to follow and express here: logger and context are special, because they follow the same execution path (because tracing), and everything else is not the same as the trace execution path.

Please see the revised proposal.

It allows for Vapor's feature request of:

handle { request in 
  http.get("", context: request) 

by opting into "being a" context by conforming to ContextProtocol.

We're using explicit passing, as was discussed in the thread.

We are not passing the event loop as part of the context, I really feel this is mixing too many things and now what we will want to have in the long term future in the core of the context passing story.

Please have a look at the PR and let's discuss there? Thanks for your input :pray:

Roadmap:

  • I would like to try to close up this API and make a 1.0 release candidate once we find a new "non-gsoc" home for these repos,
  • We'll polish up the PoC PRs which only depend on the context,
  • and then we'll continue to finalize and freeze the Tracing APIs.
2 Likes