The Context Passing Problem

tanner0101 · August 13, 2020, 4:53pm

ktoso:

And Database can:
protocol Database: LoggingBaggageContextCarrier { ... }
and thus this will work, as you wish for in your initial example @tanner0101:
app.get("users") { req in 
    User.query(on: req.db).all() // yay
}

This looks great and was exactly what I was thinking.

Now this is where I get confused or perhaps disagree. Vapor's AHC wrapper currently works like this:

app.get("foo") { req in 
    req.client.get("https://vapor.codes")
}

req.client wraps the application-wide HTTPClient instance. It sends the request's EventLoop and Logger to every call to HTTPClient.execute.

Even if EventLoop, Logging, and Baggage were all in one context type, the API would suffer:

app.get("foo") { req in 
    req.client.get("https://vapor.codes", context: req)
}

Why do I need to pass req as context if I am calling this method on req in the first place?

Another way to look at this is Vapor moves context to the left instead of the right. The context is stored on the thing you call instead of passed into the call itself.

Unless AHC gives us some way to pass this context via the client instance, then we must continue to wrap it. For a concrete example, why not declare HTTPClient like so:

protocol BaggageContextCarrier {
    var baggage: BaggageContext { get }
    func using(baggage: BaggageContext) -> Self
}
protocol LoggingContextCarrier {
    var logger: Logger { get }
    func using(logger: Logger) -> Self
}
protocol LoggingBaggageContextCarrier = LoggingContextCarrier & BaggageContextCarrier

protocol HTTPClient: LoggingBaggageContextCarrier {
    func execute(request: HTTPRequest) -> EventLoopFuture<HTTPResponse>
}
extension HTTPClient {
	// Convenience API
    func get(_ url: String) -> EventLoopFuture<HTTPResponse>
    func post(_ url: String) -> EventLoopFuture<HTTPResponse>
    func patch(_ url: String) -> EventLoopFuture<HTTPResponse>
    ...
}

// actual HTTP client implementation
class HTTPConnectionManager {
	init(group: EventLoopGroup)
    func execute(
        request: HTTPRequest, 
        logger: Logger, 
        baggage: BaggageContext
    ) -> EventLoopFuture<HTTPClient>
}

extension HTTPConnectionManager {
    // use the manager to create a "context aware" client
    func client(logger: Logger = .init(), baggage: BaggageContext = .init()) -> HTTPClient {
    	HTTPConnectionManagerClient(connections: self, logger: logger, baggage: baggage)
    }
}

// this struct is essentially HTTPConnectionManager + context
private struct HTTPConnectionManagerClient: HTTPClient {
	let connections: HTTPConnectionManager
	var logger: Logger
	var baggage: BaggageContext
	
    func using(logger: Logger) -> Self {
    	var copy = self
    	copy.logger = logger
    	return copy
    }
	
    func using(baggage: BaggageContext) -> Self {
    	var copy = self
    	copy.baggage = baggage
    	return copy
    }
    
    func execute(request: HTTPRequest) -> EventLoopFuture<HTTPResponse> {
    	self.connections.execute(
    		request: request,
    		logger: logger,
    		baggage: baggage
		)
    }
}

// Usage:

let connections = HTTPConnectionManager(...)
defer { connections.shutdown() }

let client = connections.client()
// default context
client.get("https://vapor.codes")
// custom logger
client.using(logger: myLogger).get("https://vapor.codes")
// custom baggage
client.using(baggage: myBaggage).get("https://vapor.codes")

Again, this isn't something AHC needs to do. But it would allow Vapor to use the HTTPClient protocol directly.

ktoso:

public protocol RedisClient {
    var eventLoop: EventLoop { get }
    var logger: Logger { get }
    func logging(to logger: Logger)

    func send(
        command: String, with arguments: [RESPValue], 
        context: BaggageContextCarrier
    ) -> EventLoopFuture<RESPValue>
sounds good to me? It allows the baggage to be passed through the operations, and it allows the customization of the logger ad hoc...

I don't understand the difference between exposing BaggageContext via a protocol or passing into each method call. As long as its the same BaggageContext who cares how it is delivered? The only way I could see this mattering is if it were passed inout and the mutations from each call needed to be recorded in order. But this is not the case (and would also not work well with futures).

If the issue is that BaggageContext must gain additional context as it is passed down the chain, then that makes sense. That is what the using(baggage:) calls are for. In Vapor 4's ORM (Fluent) for example, the Request.logger has a database-id property added. This becomes the new contextual logger that everything lower in the chain uses meaning PostgresNIO (Postgres client package) for example includes both Vapor's request-id and Fluent's database-id in its logs.

For example, take this pseudo-code for how req.db is generated:

extension Request {
    var db: Database { self.db(nil) }

    func db(_ databaseID: DatabaseID?) -> Database {
        var logger = self.logger
        logger["database-id"] = databaseID
        return self.application.databases.get(databaseID)
            .using(eventLoop: self.eventLoop)
            .using(logger: logger)
            .using(baggage: self.baggage) // Future thing?
    }
}

For me the issue is that the context is already available on the type you are calling on. Again from my example:

req.client.get(...)

As far as I understand, req already has all the context you need. It has an EventLoop, Logger, and in the future will have BaggageContext (with trace ids deserialized from the incoming request). So can't we keep the actual call site clean?

It's possible I'm misunderstanding something fundamental about BaggageContext that requires different handling than Logger / EventLoop though. If so my whole argument falls apart.

I also want to reiterate that I'm not arguing that the way Vapor does context passing is best/only way it should be done. I think that explicit context passing is easier to understand and perhaps more appropriate for low-level libraries like AHC. My argument is only that the protocol-based pattern of context-passing that Vapor uses eliminates the need for wrapping and convenience API duplication.

As a final example of this, take the PostgresDatabase protocol in Vapor's PostgresNIO package. This is the lowest level way of interacting with Postgres. It maps 1:1 to the wire commands being serialized. The protocol looks like this:

public protocol PostgresDatabase {
    // context like EventLoop, Logger, etc
    func send(
        _ request: PostgresRequest
    ) -> EventLoopFuture<Void>

All commands are added as extensions on this protocol. Like simpleQuery, query, etc. What this means is that in Vapor, you can do:

app.get("foo") { req in
    if let psql = req.db as? PostgresDatabase {
        psql.simpleQuery("SELECT version();")
    } else { 
        fatalError("not postgres")
    }
}

The critical piece here is that the actual instance of PostgresDatabase you get is essentially a "request-specific Postgres database". In other words, it's using the request's logger and event loop (and in the future tracing) automatically. This is important:

1: A low level API from a package not depending on Vapor was used directly.
2: No explicit context passing was needed
3: Everything just worked. The best possible logger and event loop were used.

And Vapor not having to wrap means:

Less code duplication.
New features added in low-level libraries are available immediately to Vapor users.

Again, for something like AHC the code duplication is minimal. But for something like Redis or Mongo, the convenience API surface can be huge. Re-implementing / wrapping those becomes a real problem.

Mordil · August 16, 2020, 10:55pm

ktoso:

Mordil:

I agree with Tanner that the way AHC is passing around the logger makes sense for itself, as there are a lot of details going on with a given request - and the library is lower level than RediStack is.

However, adopting that style is a non-starter for RediStack as the sheer number of commands that not only are supported by Redis, but the number of Swift APIs that I have written (and will continue to develop) on top of them makes it a maintenance nightmare to add even a single new parameter.

First, I do realize that yes it is annoying to pass context around, no-one loves it, but that's in reality the only way we can get proper end to end tracing in Swift today . What API guarantees are you worried about upholding that adding an optional context parameter would such a nightmare?

There are no API guarantee concerns related to adding new parameters, especially at the end (most likely) of declarations.

The maintenance burden is the same as the age old problem of supporting both variable arguments and the array syntax of the exact same method.

If I need to add one more parameter to all my methods, I have to add it to every single iteration of 200+ methods as well as updating their documentation.

This is in contrast to what Tanner has been saying of letting the object receiving the operation "provide" the context for the operations.

For the record in my refactor PR, RedisClient is changing from:

protocol RedisClient {
  var eventLoop: EventLoop { get }
  var logger: Logger { get }

  func send(command: String, with: [RESPValue]) -> EventLoopFuture
  func setLogging(to: Logger) // yes, this is not thread safe
}

to

protocol RedisClient {
  var eventLoop: EventLoop { get }

  func send(command: String, with: [RESPValue]) -> EventLoopFuture
  func logging(to: Logger) -> RedisClient // This will create a new operation specific context provider
}

So it matches the pattern that's being established (and seemingly approved of in this thread)

let connection = RedisConnection.connect(/* ... */)
defer { _ = try? connection.close().wait() }

let results = connection
  .logging(to: myCustomLogger)
  .get("foo") // any logs generated in send(command:with:) will use 'myCustomLogger'
  .and(connection.get("bar")) // any logs generated in send(command:with) will use the "default" connection logger

ktoso · August 17, 2020, 6:10am

Thanks a lot Tanner & Nathan for giving this some thought! Let's unroll and deep dive into the differences here.

Before we dive in, allow me to share some context in which we're operating, as it shapes a lot of the needs and tradeoffs we need to take when designing these APIs.

Very roughly, one can claim that there are two "kinds" of users (and everyone somewhere on a line between those extremes), which shapes what requirements they pose to their tools and frameworks:

[a] teams with a simple not-very-distributed system
- it may have one or two databases, caches, maybe it talks to one or two other services in the same company; it's still "fits in your head" and maybe you even know teams which operate the other services
- (though note: all systems start small, and some of them then inevitably become more complex large distributed systems.
  - We want to avoid "trapping" users in situations where they and their frameworks can't "grow" with them (i.e. let's avoid having people hit a wall and go off writing "we had to rewrite our system from Swift to X because the tools did not scale as our needs grew!", let's avoid these by planning forward)
[b] teams operating in large organizations, interacting with tens or hundreds of other services
- in such teams/organizations, it may even be a requirement that a language/runtime has a great observability (meaning: logging + metrics + tracing) story, otherwise it'll be dismissed as cute toy but not "serious" for the backend needs of such organizations.
- for Swift to break through and be taken seriously in such organizations, the core SSWG APIs such as Logging, Metrics and Tracing (often called the "Three Pillars of Observability") are crucial, and already with Logging with Metrics we saw the situation improve by a lot.
  - Now we need to take the 3rd step here to offer a complete picture to serious large organizations evaluating if they should or shouldn't trust Swift on their backends. In other words, this is about answering the infamous "is it ready for production yet?" with a resounding yes! when the organization asking has very strong requirements on operating their services.

So today, Swift is already doing great with [a]-style use-cases on the server! And this is very much thanks to all the libraries/frameworks people in this thread have developed - this is no small feat and we should keep and carter to this audience.

But the [b]-style use-cases still have a lot of nitpicks they can point at and dismiss Swift without seriously evaluating it; Tracing is one of those boxes that we can not only deploy "somewhat", but do it truely great which will cause such teams looking at Swift see it in a much better light, even as an improvement over what they used to use for their backends. This is what we need to aim for here.

These two "styles" are obviously somewhat conflicting: [a] want the least amount of noise and friction possible, they may not care much about all these things which matter only in larger deployments, yet we also need to grow server side swift in those large organizations / systems [b] which have more requirements, and also a slightly larger tolerance for some noise to get the benefits they care about (of course no noise would be best, but we can't get there in one magical step).

So, how do we make APIs that can serve both types of users, and grow the impact Swift has on the server?

Why context is not configuration, and should not be hidden away

Let's use the most recent example of the "hide context in client protocol" style that Nathan provided:

// what-if (this API shape has issues/pitfalls)
protocol RedisClient {
  var eventLoop: EventLoop { get }

  func send(command: String, with: [RESPValue]) -> EventLoopFuture

  func logging(to: Logger) -> RedisClient
  func with(context: BaggageContextCarrier) -> RedisClient
}

This is easy but not simple, it's easy for the library maintainance to ask users always do the right thing, but it invites a few kinds of programmer errors:

First, storing a client with context:

// don't do this
var redis: RedisClient
func something(context: BaggageContext) -> Future<...> {
  redis = redis.with(context) // oh no!

  // some many redis operations
  ... = redis.something(...)
  ... = redis.something(...) // maybe ok, maybe broken
}

func other(context: BaggageContext) -> Future<...> { 
  // <100 lines of code here>, 
  // because that's how real world projects get messy after a long time...
  redis.get("foo") // whoops!
}

So I have a few redis operations here, and I got lazy and decided to "oh well, it's all in the same context anyway!" and I can accidentally cause such mistake. So let's say the code in something() maybe even still is correct, but I stored the client somewhere and in other() (maybe because some refactoring, merge conflict gone wrong etc) I missed to do this and now it not only has no trace but it has a completely unrelated trace Such things are very painful to discover when "production is on fire", and that is exactly when we look at those traces/metrics.

One can always say "don't do that", but it is a good APIs job to lead to the right out-comes and prevent abuse and mistakes. We'll discuss below also the linter situation -- we can't write good linters for this pattern making it more difficult to recommend to larger teams.

client protocol style: this pattern makes it possible to make silly, hard to "visually" (or automatically) track down mistakes;

This problem is not new to Swift, albeit with more fatal consequences, the same manifests with getting pointers to underlying storage. It should not be valid to just "get the pointer out and store it," in the same way it should not be possible to store a tracing context. This however would yield a "meh" API, such as this:

let res: Res = client.with(context: context) { client in 
  client.operation(...) -> Res
}

// in the same style / for the same reason as:
// array.withUnsafeBytes { pointer ... }

this is "better" because it stops me from doing the storing the wrong context mistake, but since context should be passed "in a lot of places" this is obviously too annoying of an API. So I don't think this is viable.

client protocol style: Trying to use the protocol as a with() { ... } would prevent the bugs of storing contexts, but is quite annoying both for API designers and end-users.

explicit context style: it easy to express the rule of "don't store context objects, just pass them along," and as we'll see below, we can write linters to help with this.

To see how really this pattern is more complex and makes some automation and tooling impossible, let's look at function wrappers as well as tooling and linters below:

Simplicity and "Function wrappers"

While function wrappers have not been proposed "seriously", it is not unreasonable to speculate they would be a very useful and natural addition to Swift. And since we're trying to form a standard way to pass context around for instrumentation/tracing, and such annotations would be tremendously useful to these use cases, we should make sure we're future-proof for those.

Today, in order to trace executions one has to do "manual weaving" of managing spans, such as:

func send(work: Work, context: BaggageContextCarrier) -> EventLoopFuture<> { 
  let span = tracer.startSpan("#function-\(work.type)", context: context)
  
  let someFuture = ...
    .always { result in
      if case .failure(let error) = result {
        span.addError(error)
      }
      span.end()
    }
  return someFuture
}

This is a bit annoying, and while it is fine for "that one single send() function in my entire library" this is not something one would like to sprinkle around one's codebase everywhere when I'd like to trace some time a piece of code takes. (Sure, we can work around this with sugar on top of futures and with... functions there etc, but that's not the end-game because we will have asynchronous functions).

Specifically, in Redis client and other databases there usually is exactly one (or very few) function which handle the "send the query" and then a lot of convenience API on top of them. This dance today only has to be performed once, in that send function.

What we'd want to be able to do though, is the following:

@Traced
func send(work: Work, context: BaggageContextCarrier) -> EventLoopFuture<> { 
  return ...
}

much nicer...! And in the future, the same composes just as perfectly with async functions:

@Traced
func send(work: Work, context: BaggageContextCarrier) async -> Thing { 
  return ...
}

Now, we could not nicely express this if the context is hidden away in library specific types, since each library has its own "client" type. Trying to express this with "the thing I call has to conform to some protocol on which I can set a context before I call it" becomes all kinds of weird and not understandable IMO. (I tried to PoC this a bit but it looks terrible each time).

Hide context in client-protocol: It is impossible to implement @Traced if we hide away the context parameter from the function.

Explicit context (even if optional): It will be possible to implement @Traced naturally if/when some form of function wrappers were to land in the language.

So yes, it is natural to express the context to be "flowing with the calls" (same as a stacktrace isn't a thing "on the side" a trace is not a thing on the side after all -- it is the lineage of the execution). but we still have that pesky explicit parameter that people don't like...

Bear with me and read on, we'll get rid of the explicit context parameter it in the future (!).

Simplicity and Tooling

Keeping context passing Simple is very powerful, not only to avoid "having to remember which special DSL I need to call here" but also because it enables tooling.

I mentioned this above a little bit, but let's dive into it deeper. Funnily enough I had forgotten this is also a reason fow why Go's context wroks like it does (and here I thought I invented the reason, silly me, of course not -- it's been done before ):

Note:

// context package - context - Go Packages
Do not store Contexts inside a struct type; instead, pass a Context explicitly to each function that needs it. The Context should be the first parameter, typically named ctx:

(We're not doing "first" but "last", after consultation, sswg and nio teams and the std-lib team and just general swift function naming guidelines; but the general idea remains similar)

And notably:

// context package - context - Go Packages
Programs that use Contexts should follow these rules to keep interfaces consistent across packages and enable static analysis tools to check context propagation [...]

Here is what this specifically means for us. The following "hidden via protocol" pattern:

client.with(context: context).hello()

client.helloAgain() // ⚠️ it... *probably* does not have the context?
someClient.hello() // ⚠️ does it have the context or not? we *can't* know!

How could we decide where to issue warnings with a linter or even xcode itself? It becomes impossible to statically definitely decide if a call has the right context or not

The following however is trivially machine decide-able:

// hello(<other-parameters>, context: BaggageContextCarrier?)
client.hello(one, two, three) // 💡 missing context pass spotted!

Note: We can say we'd do optional context if that to ease the adoption adoption and allow people who don't care about traces to ignore the context dance. This is a tradeoff, but one we can make to ease adoption and future evolution IMO.

This allows [a] style of users to not care about it, and [b] style of users can use a linter that we can build already today using SwiftLint's analyze mode, that would detect missing context passes easily. It can be warnings or errors and thus it can be tuned to what a team needs. Say a team really wants to make sure tracing works because they have a complex system, and need all the insight they can get during an outage or performance tuning, they can make a not passed context fail to compile.

We can also provide fixits:

client.hello(name: "Charlie", option: .something)
// vvvvvvvvv
client.hello(name: charlien, option: .something, context: context)

* not a real fixit in Xcode just an image to keep your attention

This is why the concistent naming of the context parameter (if it's BaggageContext instance, or FrameworkContext that carries a context) always as "context" is important. We can easily fix things and it will be a smooth ride to add propagation when needed.

hiding context in client protocols: impossible to write linters and fixits for teams which care and want to always ensure the highest standard and quality of their traces.

explicit context passing: It is possible and relatively simple to build tooling, linters, and fixits that would fill in the missing parameter.

What we'd really want: Asynchronous Contexts / "Execution Scoped Variables"

Let's also keep reminding ourselves that this isn't about the Swift today but where Swift is heading. And while John has not yet published his concurrency proposal, it is fair to say it involves some form of awaiting and coroutines (although not just that).

So these APIs need to work well with async/await style and the arrival of these features means having to change all signatures and patterns anyway, e.g. because you may want to avoid returning EventLoopFutures perhaps, i.e.

func hello(context: BaggageContext?) -> EventLoopFuture<Hello> {} 
// vvvvvvv
func hello(context: BaggageContext?) async -> Hello {}

Then, it is only natural to want to express "some data that flows with the execution" along with async tasks and functions. This would be "Async Context" (or similar, other languages call these Scope Variables), and if we'd get to that point we can remove the explicit passing around but semantically it is exactly the same and migration will be simple:

@deprecated(explicit passing is not necessary anymore)
func hello(context: BaggageContext?) -> EventLoopFuture<Hello> {} 

// In Swift "X"
func hello() -> EventLoopFuture<Hello> {
  BaggageContext.current // which would use AsyncContext<BaggageContext>.get
}

this looks like but is not thread local variables. Thread locals cannot work well in Swift with its various executors and concurrency libraries; The use of thread locals to pass around context should not be part of the core story; though if you are some single threaded framework... you could get away with it -- but really, it's 2020 and being async and non-blocking is the norm, and the same will be true to distributed tracing soon enough.

In any case, we don't know when or if we'll get this, so it does not make sense to hold our horses until this feature lands, but we should make the evolution towards what we really want a stright one, and then allow the language to help us removing more and more noise from it.

explicit context passing: is a natural "predecessor" to asynchronous, implicit, context propagation that could be offered by a coroutine runtime, which has proven useful in existing coroutine implementations, and as such it is not unreasonable to expect such feature to exist someday.

More reading on this topic:

A good read on CoroutineContext and Scope (Scope.coroutineContext) in Kotlin: Coroutine Context and Scope highly recommended to read this

this specifically applies a lot to us, as concurrency's fundamental building block basically known to be coroutines by now; Note how a coroutine context can have values added to it, and retrieved from by type; this is the same how we'd carry a baggage in such "coroutine context" / "async context"

Java's pursuit of Scope Variables in project loom (coroutines)

A few specific ways forward to consider

As I see it, we have a few "ways" forward, and I'll list them below explicitly, and we have to keep in mind the upcoming "Swift Concurrency" story, so things will change, so the various ways to can phrase the APIs can either prepare for migration to that future concurrency, or we can wait it out, and do nothing until the new things land perhaps (but I think it'd be a waste to completely punt/wait until then).

Let's focus on the concerns you mention first though:

tanner0101:

req.client.get(...)
As far as I understand, req already has all the context you need. It has an EventLoop, Logger, and in the future will have BaggageContext (with trace ids deserialized from the incoming request). So can't we keep the actual call site clean?

I understand your desire to keep the context out of here, I really do.

But looking at the examples above, I really think we're trying to "eat the care and keep the cake" if you want to provide a super fluent DSL that strongly relies on Vapor style / protocol style.

I really do understand your desire to keep the APIs as lean as possible, this is what Vapor promises to its users, but I do not think (see examples above) this is the right style for context passing and it is counter to how asynchronous and "execution scoped variables" (which is what (baggage) context is) really work. I fear that the more we try to hide it, the more pain you'll be inflicting on future adoption, unless holding out i-don't-know-how-many-years until async contexts become a thing.

Sadly I don't know if or when those would land, because that would solve the entire "explicit" problem of the explicit context passing, but I don't think we should adopt patterns which hide what we're really doing, as it will only make composition, tooling, and future evolution towards async functions (and more) very difficult.

What is wrong with not wrapping all those clients and saying the same everywhere, Vapor need not be in the business of solving every possible http / database / service client's APIs with a super fluent DSL -- the goal is consistency, not short-function-calls-regardless-of-complexity-cost-to-get-there?

If one want traces, pass the context + it will prepare us well for the future asynchronous swift world. I can't make promises about the future, but the trajectory of such APIs is consistent and in alignment with how coroutines and asynchronous contexts usually work.

fabianfett · August 17, 2020, 8:19am

Hi all,

first of all: Thank you @ktoso for those vast and hugely informative reads. They are long, but I'm very thankful for the depth in which you build your argument.

Having had the experience with a 100% traced Go backend, I'm totally on board with the story on explicit context passing. Especially the linter story you outline is a huge benefit IMHO. As Swift has decided to keep the context parameter always at the end, this becomes especially relevant, since it is mentally way more complex in code reviews to check for a context at the end, compared to a context as the first argument.

+1 As of today I think that the biggest pool of potential server side swift developers are iOS developers that want to get their feet wet in the backend. Taking stuff away that is not 100% necessary in the beginning eases the learning curve and drives adoption. If a developer/team decides to add tracing at a later point it can be added quite easily thanks to a potential aggressive linter as you outlined.

ktoso · August 17, 2020, 8:22am

Right yeah! Thanks for making that point explicitly. Of course we would not want to alienate or make it much harder to get started with the server side for folks coming from other Apple platforms (kind of my "[a]" type use-cases). While at the same time we can become better for "[b]" type server developers coming to swift who'd be missing these things -- with this road forward I think either can tune the way they want to use server side swift to what they really care about

lukasa · August 17, 2020, 9:37am

I think there's an important part of this discussion being missed that I want to highlight, because I feel like it's not being heard clearly. When @mordil says that he's concerned about adding a parameter to 200+ methods, and when I say that a sufficient case has not yet been made for API stability breakage on AHC, your response so far has been of the form:

I think you have not meaningfully tackled the objection being raised. So let me try to clarify it a bit in the hope that you'll more clearly address it.

Firstly, the common ground: I agree with you that optional context is not a good design. I agree that it's too easy to fail to propagate the context in the case where the parameter is defaulted. Users should be encouraged to make choices in order to improve the ability of developers to audit the code and understand the context passing requirements. I personally am convinced that the right design is to add context, non-defaulted, to all relevant invocation methods.

Where we disagree is that I don't think anything I just said in the above paragraph is sufficient to justify the addition of "and therefore we must break the API". Let me bring in another principle of mine: as a general rule, the less frequently you can break your APIs, the better. API breaks are occasionally necessary to clear away cruft and to embrace new features more fully, but the more users you have the more painful each break gets. The result of this is that there is positive value in preserving your major version number where you can, and cost in revving it. This doesn't mean we shouldn't break API, but it does mean that a strong, affirmative case has to be made that doing so is important and right.

For AHC, I am extremely confident that such a case hasn't been made. This is because for AHC we can much more effectively do a "deprecate-and-replace" dance. In this case, we'd keep the old code around, but deprecate it in favour of the new one that passes a concrete context type. These warnings will discourage new code from using the old API, and will also positively encourage newer users to adopt the preferred API, but will continue to allow users to have access to those older APIs. This opens the door to those users remaining supported, getting security fixes and performance improvements, and whatever else, without forcing them into a churn cycle or causing dependency hell in the dependency graph.

For RediStack, things are a bit trickier. As @mordil has noted, RediStack requires substantial work to add new parameters. I still think on balance that RediStack should consider adopting a deprecate-and-replace dance for this change, but it may be worth considering whether an alternative approach for RediStack is worth it to avoid revving to 2.0.

All of this context work is really important, and I'm glad the community is tackling it. I just want to be counselling caution on revving major version numbers where we don't have to.

ktoso · August 17, 2020, 11:19am

Thanks Cory for reminding / bringing back focus to the how do we get there, now that it seems we can (hopefully) agree that's where we want to get to.

You're right I didn't do a great job at addressing the compatibility/migration aspect and focused on the why and where-to so far... Let's fix that! I'll follow up shortly with a non-breaking migration strategy for AHC and RediStack as case studies. (For RediStack I believe that with some tooling help we can build this should be possible without making it a huge maintanance burden for @Mordil).

tanner0101 · August 17, 2020, 8:21pm

@ktoso thank you for the detailed response. I feel you understood and represented my points well!

This in particular resonated the most with me. I disagree with some of your arguments (mostly that explicit context passing is objectively less bug prone than protocol-based context passing), but overall I agree being explicit is simpler and more common. If that means we can coalesce the entire community behind one way of doing things that's a huge win. Especially if that means we can then collectively ask Swift and our tooling to make using this pattern easier.

This gave me a lot of good food for thought. I'll take some time to digest this further and think about what this would mean for Vapor.

+1 to this. Making the context non-optional is not critical enough to merit a major break. Something else will crop up eventually that requires one such as NIO 3, async await, etc. At that point we can easily drop the deprecated methods and fully require context.

ktoso · August 18, 2020, 3:44am

Thanks @tanner0101, glad these helped wrap some heads around the tradeoffs.

Now to outline some (potential) the adoption plans then:

Case: HTTP Client

The primary "glue" between many systems is HTTP calls, as such, instrumenting the HTTP Client takes highest priority (followed by other RPC systems, e.g. gRPC), then followed by "leafs" .

The HTTP Client can gain additional overloads accepting context parameters, and use those to instrument outgoing calls, automatically making Spans and propagating baggage context (incl. tracing headers).

It's current API shape is roughly:

public func get(url: String, deadline: NIODeadline? = nil) -> EventLoopFuture<Response> {
    return self.get(url: url, deadline: deadline, logger: HTTPClient.loggingDisabled)
}

public func get(url: String, deadline: NIODeadline? = nil, logger: Logger) -> EventLoopFuture<Response> {
    return self.execute(.GET, url: url, deadline: deadline, logger: logger)
}

overloaded for get/post/patch/delete and execute calls.

The non breaking addition we propose to add is the following:

// addition:
public func get(
    url: String, 
    deadline: NIODeadline? = nil, 
    context: LoggingBaggageContextCarrier // new
  ) -> EventLoopFuture<Response> {
    return self.execute(.GET, url: url, deadline: deadline, context: context)
}

and the existing overloads to be expressed as:

// change to delegage:
public func get(url: String, deadline: NIODeadline? = nil) -> EventLoopFuture<Response> {
    return self.get(
        url: url, deadline: deadline, 
        context: HTTPClient.emptyContext
    )
}

// change to delegage:
public func get(url: String, deadline: NIODeadline? = nil, logger: Logger) -> EventLoopFuture<Response> {
    return self.execute(
        .GET, url: url, deadline: deadline, 
        context: DefaultBaggageLoggingContext(logger: logger, context: .empty)
    )
}

There are other parameters used in execute, such as:

// addition:
    public func execute<Delegate: HTTPClientResponseDelegate>(
        request: Request,
        delegate: Delegate,
        eventLoop eventLoopPreference: EventLoopPreference,
        context: LoggingBaggageContextCarrier, // new
        deadline: NIODeadline? = nil,
    ) -> Task<Delegate.Response> {

Focusing on the "miscellaneous" parameters (so except request/delegate, which are core part of the API and the "main" parameters really), we'd have:

event loop - preference is not defaulted in this call, thus proceeds context
context - is "last parameter in set of non-defaulted non-function parameters", thus it'd be added here
deadline is optional, so we'd keep it at the end, following swift style guidelines on defaulted params after non-defaulted

Again, delegation to this function from the previous overload would be done by an "empty logging context".

As a reminder LoggingBaggageContextCarrier contains both a logger and a context, and the logger uses that context when logging as well (thus logs remain nicely corelated).

Composition

With "any" other library

With other APIs/libraries/frameworks can be done via passing other context objects as last parameter:

http.get(url: "", deadline: deadline, context: some context)

Interesting side-discussion... The currently used deadline: NIODeadline is the "local deadline/timeout" but there is another type of deadline -- a distributed deadline (!) . For this reason I think it is fine to keep the existing APIs to pass deadline as-is, and IF someone wanted to adopt distributed deadlines (meaning "if you're a server, and it's like a second after that distributed deadline already, don't even bother processing this request").

Composition with Vapor (today / options)

Vapor 4 today, already has a wrapper type for HTTPClient. We're aware this is sub-optimal and would like go avoid having it, but it's there today.

Since the wrapper exists today it could be used to delegate through to the actual client passing the context: self.request if it wanted to raise the required AsyncHTTPClient version. Though this would embrace the wrapper style more rather than less. It is a road that could be taken, up to Tanner and team if that's worth it.

Alternatively, or "the new way" the http client can be used as:

app.get("users") { req in 
req.(client (actual HTTPClient type, not wrapper))
  .get(url: ..., context: request)

if request conformed to LoggingBaggageContextCarrier which is should be able to.

This means moving away from providing those wrapper types, which I think is what @tanner0101 was hoping to get out of the business of doing (wrapping all clients).

Note that this is not new API (the get) in Vapor, but simply the surfaced new http client, which CAN accept a request if needed to.

We'll discuss below (after redis discussion) what / when about adoption and deprecations, i.e.:

should the wrapper be deprecated? (up to Vapor community to decide that)
- probably it can be deprecated slowly, only once next Vapor major release lands, and we have concurrency in swift, then there's "good reasons to move people off the other API asap" I suppose? It's up to the Vapor community though.

Case: RediStack

Redis clients, including RediStack have a slightly tougher time adopting additional parameters without source-breaking because of the sheer amount of the commands Redis exposes.

Sidenote: Some clients I'm aware of use source generation from redis-doc/commands.json at master · redis/redis-doc · GitHub with an "overlay" which contains all extra information like docs or extra type and snippets to use in code generation. This allows to lessen the maintainance burden of having multiple command overloads, as they're all generated anyway.

Just throwing it out there as an idea that we've seen this done; though RediStack is not doing this, thus adding parameters is painful and multiplies API surface maintanance burden.

Okey, but let's look at the specific APIs and implementation we got. So there's a lot of functions that roughly follow a similar pattern:

    /// ... docs ... 
    @inlinable
    public func setnx<Value: RESPValueConvertible>(
        _ key: RedisKey, 
        to value: Value
    ) -> EventLoopFuture<Bool> {
        let args: [RESPValue] = [
            .init(from: key),
            value.convertedToRESPValue()
        ]
	
        return self.send(command: "SETNX", with: args)
            .map(to: Int.self)
            .map { $0 == 1 }
    }

We'd need to add the following overload:

    /// ... docs ... 
    @inlinable
    public func setnx<Value: RESPValueConvertible>(
        _ key: RedisKey, 
        to value: Value,
        context: BaggageContextCarrier
    ) -> EventLoopFuture<Bool> {
        let args: [RESPValue] = [
            .init(from: key),
            value.convertedToRESPValue()
        ]
	
        return self.send(command: "SETNX", with: args, context: context)
            .map(to: Int.self)
            .map { $0 == 1 }
    }

These two only differ in the presence of the context parameter. Indeed, mirroring all existing APIs would be a pain.

My assumptions:

would not want to / can not add optional context parameter?
- the version of RediStack is 1.0.0-alpha.11 so I'm not sure if it promised stronger source compat guarantees than the version implies...? Just to call it out – so we specifically don't want to solve this via an optional parameter it seems, right?
mirroring all API by hand is not workable / maintainance burden
RediStack is interested in participating in tracing -- it would be great since we know of teams using it which would be happy if it did so.

Proposal: "Compat shim" source generation

As @Mordil said, the same pain point comes up in multiple areas in Swift, e.g. wanting to support a var-args API mirroring another, or now needing to add a context param etc.

I've had some good experiences with swift-syntax this year using it for use-cases very similar to this and I could offer help building a small tool which would do the following:

    /// ... docs ... 
    // @genCompatibilityShim(skipParam:context) <<<<<
    @inlinable
    public func setnx<Value: RESPValueConvertible>(
        _ key: RedisKey, 
        to value: Value,
        context: BaggageContextCarrier
    ) -> EventLoopFuture<Bool> {

which is easy to pick up using swift-syntax:

final class GatherActorables: SyntaxVisitor {
    // ... 

    override func visit(_ node: FunctionDeclSyntax) -> SyntaxVisitorContinueKind {
        // FIXME: just check the comments of the node, not the entire node
        guard "\(node)".contains("@genCompatibilityShim") else { 
            return .skipChildren
        }

        // TODO: parse the mini-dsl "aha, ommit the context param, okey"
       // TODO: collect all those nodes and their shim DSL information
    }
}

from such collected information, emitting a slightly transformed node, i.e. "remove that context parameter" and delegate to the real one from e.g. RediStack/Commands/HashCommands.swift to RediStack/Commands/HashCommands+CompatibilityShims.swift. Note that this can rather easily also mirror documentation comments, which is a nice win, so they'd remain in sync etc.

I have worked with these things before in similar fashion and am more than happy to help out making such tool, especially since it might come in handy with all kinds of compatibility shim situations which otherwise would be too painful to do manually.

Are my assumptions in this section correct and do you think this would help RediStack adopt context propagation?

Note that the context is just new functions, and the old ones can remain as-is if you do not want to "jump ship" right away and we can also take it easy / slow with deprecating anything -- your call how fast or slow to adopt these.

When to deprecate?

Then there's the discussion about when to deprecate the non-context requiring functions and honest I think that's really up to each library to decide... In HTTPClient I believe we should deprecate them, and I think that's what @lukasa is saying as well; though it'd remain source compatible; If this is too early to do so however -- then let's not, and we can do this later. There's no need to completely rush this.

And as mentioned before, we could also offer those linter checks if APIs need to remain using the non-context-requiring selfes for a longer period of time. I'm also happy to take on or help with this work potentially.

Swift Concurrency Considerations

We already briefly mentioned that some form of "coroutine context" or "async context" may be something that happens at some point (no guarantees!).

However, even if, it _may not necessarily address context/trace propagation in non-async functions / non async contexts (!)__, thus we still need a way to deal with "oh no, I can't use the automatic async-task-local language provided context propagation here" so in those cases we'd still want to be able to jump to the common pattern of just passing it around.

I think embracing the explicit passing right now, is not a roadblock for adopting the language features if and when they happen; Kind of as Tanner highlights above here, it's a stepping stone.

Note also, the language features if/when would come with major language changes and the swift concurrency model, and at that time, a lot of APIs would need to make more or less drastic adjustments. So it's fair to adopt explicit passing right now, and if we'd get some features, we can then deprecate the explicit functions if people really hate them

Timing...

I think we perhaps have become over-excited here, and I'd like to mention that other than the HTTPClient I don't think we're necessarily in a rush here. Teams using raw NIO and HTTPClients (and gRPC) will be able to slowly ramp up and start using this, and the more they do, the more they'd like other parts of the system to trace as well. (I do admit we do think mostly about supporting internal teams when I say this - however those teams also love the SSWG OSS projects, so there's a general direction and wish that the SSWG ecosystem libraries eventually become able to participate in tracing - but if it's not right now, that's okey.)

We're still wrangling with some details of the Tracing API as @slashmo is finishing his GSoC this month, but we'll continue work on this and move towards a stable reliable API everyone can rely on.

I would like to keep asking for feedback though, also on the Tracing APIs! The ongoing PoC embedding it in Lambda https://github.com/swift-server/swift-aws-lambda-runtime/pull/162 is an useful proving ground where we can see it "in action" in a more complex runtime than the http client. So any feedback, also from adopting these APIs in your frameworks would be very useful – while we're slowly getting to locking them down.

Mordil · August 18, 2020, 6:31am

@ktoso This has been a fantastic and well-thought out reply that has changed my opinion on a few points.

I'm going to think about this throughout this week and make sure to reply by Friday on some thoughts on how to move forward.

pokryfka · August 19, 2020, 3:19am

For the record:

aws-sdk-swift does generate (hundreds of) operations using a template, see https://github.com/swift-aws/aws-sdk-swift/blob/master/CodeGenerator/Templates/api.stencil

@adam-fowler

(not sure if it makes sense but) technically it would be relatively easy to have two "flavors" in fully automated way:

A tag on master branch, which prioritizes API compatibility, could trigger a workflow which:

pulls the code
generates API using different template (which requires context parameter in API calls)
runs tests
checks in changes (on another branch)
tags it using the original tag and a suffix

User could choose if to import v5.0.0 if he'd rather not break existing code
or v5.0.0-EXPERIMENTAL_CONTEXT to make sure context propagation is enforced at compilation time.

During next major (API breaking) release it could be decided to choose just one approach.

adam-fowler · August 19, 2020, 8:17am

AWSSDKSwift does generate all it's API entry points using a template. You could create a second branch and set of tags in an automated way. It provides a slightly confused story for users of the Package though. I would rather there was one HEAD version of the API. Also enforcing a non-optional context would most likely propogate changes down into the core library aws-sdk-swift-core which would also need an EXPERIMENTAL_CONTEXT branch. The update of this would be a great deal harder to automate and would need some manual maintainance.

We are nearing the end of the alpha stage of 5.0.0 aws-sdk-swift and are still in a phase where are we able to make breaking changes at the moment. I have already spoken with @ktoso about timing of releases and will ensure we have some tracing support for the final release of 5.0.0. I'm ready to hold off until the official release of swift-trace.

I believe for the moment we should go with an optional BaggageContextContainer and rely on the linter functionality @ktoso has previously mentioned. It doesn't required immediate adoption of tracing. In addition AWSSDKSwift can be used on the iOS client and requiring tracing there doesn't really make sense.

This thread has very much moved away from discussing the other context variables like EventLoop and Logger. I'm currently debating on what I should be doing with those as well. Do people think a mixture of explicit/implicit methods make sense, or should we be passing everything directly. I'm concerned the mixture of methods will look a little confused.

let loggingS3 = s3.logging(to: logger)
loggingS3.listObjects(.init(bucketName: "confused"), context: baggage)

But passing everything directly means we get back to the issue @tanner0101 was originally trying to resolve.

ktoso · August 19, 2020, 2:45pm

Thanks Adam!

On the other hand, what if offering overloads with passing context, i.e. having it not required to be passed. I recently talked with @tomerd about this and we drew a distinction between "very glue" libraries such as http clients, http servers and RPC systems (e.g. gRPC), and "leafs" -- in some ways AWSSDK is more of a leaf than a "glue" if you know what I mean. So maybe an optional context carrier parameter would be a good way to adoption? It would not be source breaking; and if you're generating functions anyway, it seems (please correct me if I got that wrong) the AWSSDK can more easily decide "let's add another overload" whenever ready and that would not be breaking?

Yeah I think that's right. I agree we don't need to force it "hard" on all libraries (exceptions being: "glue between services" http/grpc/rpc). and an optional carrier should do the job well here.

We discussed some more with @tanner0101 as well, which resulted in making these tickets you all may be very interested in and may want to voice some ideas in:

Design composable Carrier protocols #22 which extends the current notion of Carriers (or rather, should we extend them or not)
- I'm a bit worried about this one, but that's why the ticket, to dive deep into it and also address the event loop preference question; let's figure it out either in this forums thread or that github issue.
and Revisiting naming #23 since we're not in love with the long names the carriers produce and there are a lot of libraries that already use the word Context I noted some of the naming inspirations and worries there; It's something we should discuss before locking in the baggage context APIs.

Would love to hear everyone's thoughts there; As a heads up, I think I may have to be a bit slow to reply to those for a while as I need to focus on something else for a week or two. But timing wise I think things should align...

As for importance: it is far more important to lock down the baggage context API (naming and carriers) than the tracing APIs per-se, because tracing and also OpenTelemetry which we're aiming to provide a compatibility layer for are still a very aggresively evolving space "globally" (in all standards/tracers) while baggage does show visibily in user APIs (and tracing does not, it is internal to adopting implementations).

adam-fowler:

ncerned the mixture of methods will look a little confused.
let loggingS3 = s3.logging(to: logger)
loggingS3.listObjects(.init(bucketName: "confused"), context: baggage)
But passing everything directly means we get back to the issue @tanner0101 was originally trying to resolve.

I think that's a great time to loop back to this and I'd love to discuss those more. The ticket Design composable Carrier protocols #22 would be what we'd like to resolve by these discussions. So... do we need a few "well known carriers" and libs can use them? Or is it too much of a risk with future evolution or not etc.

Personally carrying a logger with a context is very natural, so context: thingThatHasLoggerAndBaggage is definitely part of the plan -- since the logger basically always would want to also be able to reach to baggage to log trace IDs etc. There's open questions about the event loop preference.

Thanks a lot for keeping the conversation going! Let's slowly but surely keep thinking about those; as mentioned before, I think the most important things are: the baggage API so we don't have to rename anything after we declare it's good, adoption (non-breaking) in http client and grpc, and then whenever a library is ready and wants to participate other leaf libraries. I agree that slowly rolling out and having the carriers/context optional is a good way to adopt this by many libraries, so let's take it easy and adopt without breaking APIs wherever possible

Mordil · August 27, 2020, 10:19pm

OK, it's been a very long and busy week so it took much longer than expected to answer here.

I've been convinced of the value of not only the context passing, but the explicit context passing. Enough to sign on to the maintenance burden that I have a feeling this might cause.

Here is my plan for RediStack:

I'm going to ship 1.0.0 with the protocol-based "wrapping" that Tanner described and I've currently designed around.
- I'm going to refine my current refactor branch to go deeper into the stack, primarily the connection pool to provide as many "context specific" logs as possible
Once the baggage API has become stable, I will redesign around it and explicit context passing for a 2.0.0 release (I need to cut one anyway for some Redis 6 stuff)

ktoso · August 28, 2020, 3:51am

That's great news - thank you, Nathan!

Your roadmap also sounds very good This way we can take the time to polish the APIs and you can confidently adopt it when ready. Thanks again for spending the time to dive into this

ktoso · September 14, 2020, 1:00pm

Hello again everyone,
I would like to request everyone's input on what I perceive a final proposal of the API shapes: https://github.com/slashmo/gsoc-swift-baggage-context/pull/34

The general "...carrier" and "anyone can add any carrier protocol they want" idea did not fly, as it's "anyone/anything" part made compositionally fall flat on its face. The more I worked on this the more I kept going back to the core idea we need to follow and express here: logger and context are special, because they follow the same execution path (because tracing), and everything else is not the same as the trace execution path.

Please see the revised proposal.

It allows for Vapor's feature request of:

handle { request in 
  http.get("", context: request)

by opting into "being a" context by conforming to ContextProtocol.

We're using explicit passing, as was discussed in the thread.

We are not passing the event loop as part of the context, I really feel this is mixing too many things and now what we will want to have in the long term future in the core of the context passing story.

Please have a look at the PR and let's discuss there? Thanks for your input

Roadmap:

I would like to try to close up this API and make a 1.0 release candidate once we find a new "non-gsoc" home for these repos,
We'll polish up the PoC PRs which only depend on the context,
and then we'll continue to finalize and freeze the Tracing APIs.