Logging

Historically, my approach has essentially been one of the hybrid forms - a global Logger object is accessible which provides a "generic" logging facility, with an extension that supports adding context (typically Vapor's Request object) and utility methods such that individual request handler classes would do private let log: RequestSpecificLogger and do something like self.log = req.make(Logger.self).logger(forRequest: req) in their init (the latter call being necessary due to design limitations in Vapor's Services framework). The logger(forRequest:) method then saves off the Request object and chains it to the log(request:level:message:file:line:) method(s) on the "global" Logger, which through the extensions that take a Request input know how to obtain appropriate context such as a request ID, and ultimately funnel the contextualized log message down to Logger's original log() methods.

This approach is great if the context you want to add for any given log message is consistent across most or all of your implementation, and the same "fake currying" technique allows for creating per-subsystem loggers and other various levels of specificity.

The drawbacks, of course, are the additional layers of complexity and leaving the burden of making sure the appropriate logger object is used in the appropriate places on the library/app author. I'm not entirely sure what a generic form of this approach would look like, but I wanted to put it out there as a variation on some of the approaches suggested so far in this thread.

Sure, but it doesn't make it meaningfully harder either.

I would strongly encourage reducing the scope of the discussion here by aggressively considering some proposals as "out of scope" for an initial prototype. For large group discussions with many stakeholders it is very easy to become buried in the pile of great features that people love from other programming ecosystems and to attempt to satisfy them all on the first go. This is a recipe for failure.

We'll likely be best served by identifying which things have to be decided correctly up front and working on those initially, while ensuring we do not prevent any of the other, richer, features we want from emerging. I'd argue that logging contexts fall outside the "must get right the first time" category. They can be added after the fact with small API changes, and are not mandatory for many use cases. Additionally, they are themselves a quite substantial feature that deserves the proper focus of this WG, not something that we should assume is simply solved by throwing a String around.

2 Likes

Thanks @tomerd, that's a good summary and as you point out, the MDC approach won't work if not all thread-crossing libraries collaborate which I don't think will happen in the Swift space. Dispatch doesn't do that today (on Linux) and we also don't want to wait for a year.

@gps The issue with MDC is that it needs to have special support from all involved libraries that do cross threads. In Swift I think this is unrealistic as we first and foremost would all need to switch to a Dispatch wrapper that propagates the context. I think MDC as is a necessary crutch which implicitly propagates the context in a 'magic' and hard to understand way. When it was designed we lived in a thread-per-request world and thread locals seemed like a good solution. And in a mostly blocking world, thread-locals do a fine job, when the Java world moved more and more toward non-blocking they just had to make MDC work by having the libraries all propagate the context implicitly like the thread-locals did before.

I wanted to discuss the explicit passing of context in Swift because we have a fresh start and because lots of Cocoa is already asynchronous so I think our issue with automatic context propagation is harder than it ever was in Java. Just to point it out: There is os_activity which does propagation of some sort of context through Dispatch/XPC/... but it's Darwin-only and is only documented in the manpages.

@IanPartridge queue/thread locals can indeed be used to hold the context but something needs to put it in there and that's the complicated part. So every queue/thread crossing library needs to forward the context.

1 Like

After thinking about this a bit last week and discussing with @johannesweiss, I think the best solution would be a global Logger that allows you to pass an optional context.

Logger.info("foo")
Logger.info("foo", ctx)

In cases where you can't easily pass this context (such as a computed property getter), you could just use the global log method. For cases where you can get access to the context, passing it would enhance your log output.

As @johannesweiss has mentioned, it seems there is no way in Swift to relate asynchronous work w/ a given "activity" (i.e., request). For example, if an incoming request triggers an async database query, there is no way for SwiftNIO to know that static logging done as a result of the async database query should share a context with the request that triggered it. The only way to form a relation between the request and other async work it performs is with a context.

The good news is that if SwiftNIO can provide general-purpose context type, this would be useful tool for solving other problems (such as metrics). Frameworks built on top of NIO would be expected to assist in passing around this context as much as possible.

I agree that omitting an argument of the type we are discussing (e.g. context, subsystem) from the logging API doesn't make adding a context or subsystem value to the log message meaningfully harder. However, what is harder is arranging for all log messages the system emits to format those values in a consistent manner. Thus, I think there is value in including separate parameters in the API, especially since swift parameter defaults can make it easy for users to ignore those parameters if they want to. Plus the documentation for those parameters in the API can describe how well behaved libraries (and applications) are to use the parameters. That documentation can also point out to users and library authors how valuable information like request context is during debugging to encourage them to go to the trouble of using it. Finally the parameter names (for information which must be passed around) can be adopted as standards by users and library authors making those systems easier to design and understand.

Another parameter which I would suggest adding to the logging API is

data: [(name: String, value: CustomStringConvertible?)]? = nil

I find that having a convenient way of passing in name/value pairs to my logging system without having to format them in application code encourages me to think about what data would be useful when reading the message and encourages me to actually include it with the message. It would also enable application developers to format all of the name/value data in their log messages consistently system wide.

Wouldn't it be fairly simple to build a "global" API out of "local-first" API? I don't think the inverse would be quite as easy.

As Heß suggested, maybe this should focus on the protocol and APIs, plus a good "local" drop-in solution, then more global solutions can simply wrap around that.

Maybe a Logger should take a protocol-typed object; one that has a log level and a CustomStringConvertible.

That way you could have your basic info/debug/warning/error/whatever levels, but people who need to pass in more context have a way to build up a value to do so

Agree. IMO this would include topics like how we configure loggers in a way that allows configuration to be changed at runtime without restarting the process

IMO, this is absolutely something we should get right up front. Personally, I don't think I've ever deployed a web service without contextual logging.

Agreed. I'm not a huge fan of how MDC accomplishes what it does, but it does solve a messy problem.

Why not have a global logger factory that can create a local logger instead (i.e. what @tomerd suggested)? The local logger would have context associated with it. So, when you use the local logger, you need not pass in ctx. In situations where you don't want to create a local logger, you can use a "root" global logger, and pass in ctx explicitly (or omit it if you don't want it).

This would allow for frameworks to create and pass through local loggers through their DI mechanisms (i.e. what @Helge_Hess1 suggested):

func get(_ req: Request) throws -> Future<Foo> {
    let logger = req.make(Logger.self) // Returns local logger scoped to this request (creates if necessary), with context
    logger.ctx["someKey"] = "someValue" // Add to context
    logger.info("Something") // logs with context
    // ...
}

Similarly, the same local logger instance can be handed off to NIO and other libraries by a higher level framework so they always log with context as well.

For frameworks that don't want to deal with context, they can either use the root logger, or create one logger for their module and use it (similar to the example above).

as far as i can tell, the key remaining open issue is how to make context available to the loggers. this is an important requirement for servers / distributed systems for the reasons mentioned earlier in the thread. there are two ways we debated in this thread:

  1. explicit: passing the context around between functions as an argument, such that eventually the logger has access to it. since we need to be able and log with context from potentially anywhere in the program, and since there is no macro solution that can help provide this argument "automatically" this means adding such argument to most/all function of the program and manually passing it through. golang has gone down that path with it's context package, and also added things like deadlines and cancelation to it
    pros: no side-effects: clean, pure, functional
    cons: intrusive: almost every function will need to take the context argument to make it useful. and as such not pragmatic / maintainable at a ecosystem scale

  2. implicit: in lack of other options that i am aware of, this means adopting an "activity" level storage (similar to mdc) where libraries that schedule work such as dispatch, swift-nio (or extensions of such libraries) build compatibility layers that can handle storing and retrieving logging context (or activity identifier, eg os_activty) as they move work on/off threads
    pros: complexity is concentrated
    cons: stateful / has-side-effects and as such error prone; no support in dispatch (afaik)

from where i stand, we need to choose between the lesser of two evils:

explicit: passing context around all functions is intrusive and probably won't be adopted by the larger community
or
implicit: dispatch does not support what we need atm (afaik), so work offloaded through dispatch will lose context; not "clean" as one would hope in a new language and ecosystem

my vote goes to the implicit solution, since the impact is concentrated to libraries that do scheduling (and as such are already complex in nature) while keeping user-land APIs simpler and easier. my assumption is that over time we could get the support we need from dispatch if we make the case

1 Like

@tomerd I could get behind implicit iff it would be able to correctly relate separate async events. To recap my concern:

If I am wrong, and it is indeed possible to relate these types of async tasks without an explicit context, then that will change how I view this problem quite a bit. I love the idea of implicit context, but if it cannot be correctly shared between separate but related async work, I think it is much less useful.

This is never going to be possible with implicit contexts without a quite complex notion of what a "context" is. Consider a simple NIO model where you can attach a context to a Channel, and all channel handler logs will log under that Channel's context (incidentally, this is too simplistic, but let's not worry about it).

What happens if you have a ChannelHandler that does this?

func channelRead(ctx: ChannelHandlerContext, data: NIOAny) {
    // ... some stuff...
    someOtherChannel.write(something)
    // ... some other stuff...

The write was clearly triggered by the Channel that this handler is in, but it operates on a different Channel. In which of the two contexts should log events triggered by this write fire?

It gets worse, because what happens if the code is actually:

func channelRead(ctx: ChannelHandlerContext, data: NIOAny) {
    // ... some stuff...
    someOtherChannel.write(something).then { 
        // What context applies here?
        log.info("successful result")
    }
    // ... some other stuff...

What context applies in the closure? A? B? There is no a priori obviously correct answer. So passing the context around implicitly is really hard, because it's not clear when the context needs to be propagated and when it does not.

As a way to mitigate some of the downsides to the explicit option, couldn't such a context object be stored globally and closed-over by functions instead of requiring it as an explicit parameter?

Obviously it would depend on the style of programming you choose to use, but at least with the explicit route you have the choice between these two styles.

// parameter approach
func oneOperation(_ context: TheContext) {
  howeverWeDoLogging("something happened", context)
}

// closed-over approach
var GlobalContext = TheContext()

func anotherOperation() {
  howeverWeDoLogging("something happened", GlobalContext)
}

I may be off base here — I'm not that familiar with logging contexts — but in principle this should work, right? Plus, an implicit system could be built around this, I think.

Sadly that only allows a single logging context at any one time. This doesn't work well with concurrent systems, which need to switch the logging context around whenever the specific task they're working on changes.

@tanner0101 both the implicit and explicit options talk about how to make context available, or iow, we agree that in a server/distributed system losing the context is a non-starter. to @lukasa point, libraries that schedule work like swift-nio will most likely have to do explicit handling of context, and this may be true to higher level frameworks like vapor/kitura as well. the idea with implicit context is that that the complexity is concentrated in such libraries and that the user-land APIs stay simple. the main challenge there will be dealing with work users offload using dispatch, which is something we need to figure out with the core team if we cant find a work around

True, that would complicate things. That's likely the main use case, too.
I thought global variables were thread-local. Is that true?
Could we solve that by passing the context as a parameter across thread boundaries and then using it globally within the thread?

You are correct today (on Linux) AFAIK.

Yes but it's still possible to do an 'best effort' way. That's what I understand os_activity is doing. Let's use NIO as an example, it's conceivable that we add an 'activity UUID' (read integer) field to the promises. We could also add a globally readable 'current activity UUID' which would be always set to the current Channel's UUID automatically by the Channel(Pipeline) before calling out. On creation, the promises would store the "current Channel's" UUID as their activity UUID and when calling the callbacks they would restore it and make it the 'current activity UUID'. That way your code example

func channelRead(ctx: ChannelHandlerContext, data: NIOAny) {
    // currentActivity set to ctx.channel's activity UUID

    // write creates a promise which stores the current activity UUID (ie. ctx.channel's activity UUID)
    someOtherChannel.write(something).then { 
        // the future call back would restore the stored 'current activity UUID' which still is the channel's
        log.info("successful result")
    }
    // ... some other stuff...

would actually work. Then the 'only' thing that we would need is a map from activity UUID to some diagnostic context. This is all not easy but possible in theory. MDC also tries to achieve exactly that and is very successful in Java-land but my opinion is that this won't happen in Swift, at least not anytime soon. Step 1 would be Dispatch support for carrying around such an 'activity UUID' and AFAIK it just doesn't do that on Linux today.

What I described is also pretty much how I understand os_activity works. Basically on q.async we capture an activity UUID which will be restored when the closure is run.

Sure, this is not perfect and somewhat best effort (and obviously needs help from all thread-crossing libraries like SwiftNIO and Dispatch) but in most cases it should actually work (for diagnostic information where a missing bit of information isn't a functional issue to the program). But Dispatch doesn't offer this today and I don't think we want to wait with logging until/if such functionality arrives.

No, global variables are not thread-local but you can create thread-local variables and they often do feel like globals sometimes even without the programmer noticing.

Yes, passing the context around everywhere does solve all the issues but is very invasive and as @tomerd pointed out probably not practical...

Right, let me just propose some actual code and a way with which we could proceed with something that people could use today.

As @tomerd summarised, we have two options: explicit or implicit logger/context passing (which I initially called local and global loggers, but explicit v. implicit is better terminology). Whilst explicit context passing would solve all the logging context issues it's probably too invasive for the real world without language support. We also don't want to delay having a shared logging solution so I don't think it's useful to wait for either language or Dispatch support.

That does - as stated before - mean we can't have the correct logging context available in all places and we will need to live with that. We can however declare logging context to be a responsibility of the concrete logger implementation and have it do best effort. That would mean that Vapor and Kitura might use logging systems that work completely differently but crucially share an API. For synchronous systems, it could just use a thread-local. For asynchronous systems the context needs to come from elsewhere either passed around or hung off say the 'request' type. In both cases this will mean that the logging context isn't necessarily reliably available everywhere. Especially not in asynchronous systems that call out to external libraries which are not prepared to get a logger passed explicitly.

So let's start with some code:

/// The most important type, the one the users will actually interact with and for example write:
///
///     let myModuleLogger = LoggerFactory.make(label: "example.myproject.FooModule")
///     class MyClass {
///         func something() {
///             myModuleLogger.warn("something isn't a great function name")
///         }
///     }
public protocol Logger {

    public init(identifier: String)

    /// not called directly, only by the helper methods like `info(...)`
    // EDIT2: Should change to `message: @autoclosure () -> String`
    public func _log(level: LogLevel, message: String, file: String, function: String, line: UInt)

    /// This adds diagnostic context to a place the concrete logger considers appropriate. Some loggers
    /// might not support this feature at all.
    public subscript(diagnosticKey diagnosticKey: String) -> String? { get set }
}

extension Logger {

    // EDIT2: Should change to `message: @autoclosure () -> String`
    public func info(_ message: String, file: String = #file, function: String = #function, line: UInt = #line) {
        self._log(level: .info, message: message, file: file, function: function, line: line)
    }

    // EDIT2: Should change to `message: @autoclosure () -> String`
    public func warn(_ message: String, file: String = #file, function: String = #function, line: UInt = #line) {
        self._log(level: .warn, message: message, file: file, function: function, line: line)
    }

    // EDIT2: Should change to `message: @autoclosure () -> String`
    public func error(_ message: String, file: String = #file, function: String = #function, line: UInt = #line) {
        self._log(level: .error, message: message, file: file, function: function, line: line)
    }
    // lots of to bikeshed more _log methods
}

public enum LogLevel {
    case info
    case warn
    case error
}

/// The second most important type, this is where users will get a logger from.
public enum LoggerFactory {
    // this is used to create a logger for a certain unit which might be a module, file, class/struct, function, whatever works for the concrete application. Systems that pass the logger explicitly would not use this function.
    public static func make(identifier: String) -> Logger {
        return LoggerImplementation.loggerType.init()
    }
}

/// Ships with the logging module, really boring just prints something using the `print` function
public struct StdoutLogger: Logger {
   [...]
}

/// This is the place to select the concrete logger implementation globally. The selection of the concrete logger type
/// that is used globally should only be done by the application itself (and not by libraries).
///
///     LoggerImplementation.loggerType = MySuperFancyLogger.self
public enum LoggerImplementation {
    private static var _loggerType: Logger.Type = StdoutLogger.self
    public static var loggerType: Logger.Type {
        get {
            return self.lock.withLock {
                return self._loggerType
            }
       }
       set {
            return self.lock.withLock {
                self._loggerType = newValue
            }
       }
}

That way whenever someone needs a logger, they can just LoggerFactory.make(identifier: ...) and have a logger which offers pretty much the convenience of a truly global and static Log.info("something"). But asynchronous systems like Vapor could carry around a Logger instance in their HTTPRequest.

So in an asynchronous system like Vapor you might write

myRequest.logger.info("something")

and Vapor libraries might never actually use the LoggerFactory as they always get the logger explicitly through the request.

To add that logging context, Vapor internally could run similar to this

currentRequest.logger[diagnosticKey: "request_id"] = UUID().description

By the virtue of hanging the logger of the HTTP request, Vapor could explicitly pass the logger around in its own libraries and to all external libraries that support it. Sure, any library that doesn't accept a logger will not have any logging context associated.

Synchronous systems could make their concrete logger implementation store the diagnostic context in a thread-local instead of the logger instance itself.

I should mention some tradeoffs I made here:

  • only one String typed message, no fanciness like os_log. Why? Because a) we can always extend that later b) this will not be the fastest implementation of a logger anyway we'll need to take locks, go through existentials etc to log c) this seems to mostly be what people use today
  • no configuration of log-levels etc. This is all the responsibility of the concrete logger implementations
  • a lot is down to the concrete implementation of a logger
  • EDIT: I should point out that I didn't make the message @autoclosure right now and there's good arguments it should be but that'd lead to an allocation as the concrete logger can't be inlined because it's not known at compile time... EDIT2: @Joe_Groff said that if an API relies on an efficient implementation @autoclosure this could be prioritised; therefore I think we should use it here.
  • it's the hybrid model described above

Pros:

  • we can start with this today
  • ease of use as the logger is reachable from anywhere
  • can kind of support explicit logger passing where available (if one controls the whole project)

Cons:

  • no reliable propagation of log contexts
  • everything is Strings
  • any Logger implementation will likely take locks (even if nothing turns out to be logged) and do syscalls (only if something's logged)
  • using a synchronous system's concrete logger (which might store the logging context in a thread-local) with an asynchronous system will lead to weird stuff and the logging context is propagated to random requests :slight_smile:

To be clear: This is by no means complete or what I consider to be a perfect API or anything like that. I just wanted to propose some concrete code. To get slightly less theoretical I would ask anybody to come forward with features they consider to be absolutely necessary which cannot be supported by this API. I hope that through iteration we arrive at something that anybody can live with and we can start implementing that. How does that sound?

8 Likes