Logging

Thanks @John_McCall and @Joe_Groff for pitching in, this is very relevant information. Should we kick off a separate thread on how we could flush log buffers in case of a crash? Doing that we can also chat about 'soft-landing sub-processes that crash' and maybe we find a good way that works for both systems.

To make more immediate progress on logging I feel we should continue to talk about the API first. I think that how the underlying logging system works won't surface in the actual logging API so we can do both in parallel and start with a very basic logging mechanism for example something that just writes to stdout and blocks. What we do however still need to decide is how we reach the logger: explicitly through dependency injection (what I called 'local logger') or implicitly through a global (what I called 'global logger').

Paging @IanPartridge do you think it makes sense to separate the discussions on how the logging system works from the API?

Wrt how to reach the logger, I think having global functions that "do the right thing" without the user having to pass around a logger (for the common cases) would be very valuable.

If I'm building some new component, and the "proper" way of logging is just as easy as calling print is, I'll use the proper way for sure. But if I need to figure out some way to inject a logger into the new component or do some other setup, I'm probably going to fall back to using print at least some of the time, if I'm honest with myself.

I thing the Rust log crate would be a nice model to follow/build upon.

It's built on one premise that I think is a good one: In the default case, only binaries should control where and how logs end up. Libraries are encouraged to use the log macros that log into a global logger instance, but are discouraged from setting or touching that global logger in some other way. In fact, the global logger can only be set once. The default global logger when nothing is set is a noop-implementation.

This does not stop libraries from passing around their own logger instance to do specialized logging to some file or service or something, but makes sure that the path of least resistance (just writing debug!("something")) doesn't mean clients of libraries have to deal with stuff in stdout/err they don't want.

The Log trait is also very small, and doesn't include anything relating to nesting etc. That is left to the individual loggers to design an API for, which keeps the log crate very small and focused, just providing the logging macros, the Log trait, and a way to set the global logger.

Then there are some other niceties:

Every log has a target (module path by default), that the global logger can use to filter things (think different log levels for different modules, we could probably use #file until a proper module system comes along).

They optimize performance when logging is disabled by asking the logger beforehand whether a log statement with a given target and log level would even be logged. If it wouldn't, the message is never constructed. We might not need this and just use an auto-closure instead, but I'm not sure how expensive those are to construct.

(whoops, wall of text :sweat_smile: I'll see if I have the time to hack together a quick Swift-clone of the log crate later today, to see how well it would translate)

4 Likes

One way to solve the local/global "reaching the logger" question is to provide an "execution context" i.e. implicit parameters. Silently passing extra arguments to all function calls is likely to be prohibitive from a performance standpoint, so an acceptable implementation would likely use something like thread local storage and end up requiring somewhat sophisticated machinery to make sure the "context" is what you expect it to be. Even if implemented, I can see such a feature being abused and decreasing the legibility of Swift code.

That said, Swift already supports some semantics which can also be abused to great effect (like custom operators) and while there are examples of projects doing so, the wider community seems to avoid it.

Yes I think this is a good idea.

Our focus in the work group should be on finding ways we can improve the server-side ecosystem today. I don't think that solutions which require potentially complex compiler changes, changes to the stdlib (which would have to go through the Swift Evolution process) etc. should be our focus at the moment.

A common logging abstraction which blocks while writing to stdout would still be an improvement on the current situation.

To go back to your distinction between "global" and "local" loggers, my personal view is that a solution that requires passing a logging existential around everywhere as a parameter to every function is not likely to see widespread adoption, simply because of how ugly it makes the code. People won't bother, they will think "oh I'll never want to log from this tiny function, I won't bother passing in the logger, just to keep it simple". And then in the future you find that of course you do want to log from there and now you have a refactoring exercise on your hands...

If it could be stashed in a struct/class and accessed via self.logger.log() then maybe that would see more usage, but there is a lot to be said for the simplicity of offering a global Log.info("hello world").

6 Likes

This is a little off-topic, but it might be worth considering:

Logging is a different form of gathering analytics; generally the places where you log stuff are also the places where you end up wanting to gather analytics. What if the logging API were built in such a way as to support gathering analytics?

4 Likes

That would be extremely interesting to see, thanks for offering.

1 Like

Right, I hear you (and others). Having a global logger used like Log.info("hello world") however means that you won't be able to have extra payload (such as request UUID) automatically attached to all log messages issued by anything that logs whilst a particular HTTP request is being handled.

Just for the record, in synchronous systems thread-local storage can be used to store this extra payload. And for synchronous systems it is possible to wrap all libraries that cross threads and make them propagate thread-local storage appropriately. I do however believe that this is infeasible in Swift because we don't really want to wrap libdispatch, NIO's futures, and everything else that doesn't just synchronously run on the calling thread.

@IanPartridge for Kitura deployments, do you add extra information to all log messages issues by a certain request? If so, how? And what about metrics?

Indeed. We usually wire an explicit Context through to everywhere and you retrieve the logger as well as metrics systems from that context. And you rightly point out that all the problems we're discussing with logging also apply for metrics and other things.

1 Like

If we allow both "global" and "local" loggers, the problem is that programmers can't easily switch from a global logger to a local one when they want to improve diagnotics. They have to change every single log.

Whereas with a local logger then could, for example, switch from:

common.swift:
internal let log = Log()

file1.swift:
log.info("...")

file2.swift:
log.info("...)

to

file1.swift:
fileprivate let log = Log(file: "file1")
log.info("...")

file2.swift:
fileprivate let log = Log(file: "file2")
log.info("...)

edit. typo

I've uploaded a playground here: Hackish Swift clone of Rusts `log` crate Β· GitHub

It would work pretty nicely, if it were not for the fact that there's no good replacement for Rusts module path (those look something like my_app::model::sync). #file is often an absolute path, depending on how the compiler is invoked. A #module or #package would be pretty useful here, and I suspect for any logging solution. A structured #sourcelocation would be even better.

1 Like

on how to obtain an instance of a logger, aka "global" vs "local" loggers, i believe there is a third option which is a global logger factory, which is then used to obtain logger instances in the local scope. this model offers nice separation of concerns where the logger factory handles the configuration aspects and the local logger handles the logging API (eg trace, info, debug, etc) .

so in the local scope, you will do something like:

let logger = LoggerFactory.geLogger("foo")
logger.info("hello world!")

having LoggerFactory a global singleton that is initialized by the application, or a higher level framework like Vapor/Kitura using config files, or programmatically at application start, etc

(class and method names for illustration purposes only)

the key disadvantage of this approach is that it does not allow you to attach context level metadata to log entries and cascade it down. for example, its difficult to attached a "requestId" and "userId" that will then be part of the all logging entries related to that user request, since the various user code and libraries involved in processing such request will obtain their own logger instance with no context. and this is where passing the logger/context object around really shines. that said, while i love the technical attributes of passing the logger/context around, i believe asking users to do so is too high of a bar and not practical in a large ecosystem like the one we want to build. happy to learn about creative ways to achieve that, if anyone has ideas

the java server world has largely adopted this global factory, local instance model through the SLF4J library, and it seems to work reasonably well there. it also jells well with some of the other proposals made in this thread about local handlers. what SLF4J does to address the context challenge is to offer something called MDC which basically is a temporary storage of metadata that is shared between all the code that is processing on the same processing unit (eg a thread), and the logger instances then read the metadata and include it in the log entries as they write them out. in the simplest form of MDC, the metadata storage is thread locals so that everything that is processed on that thread shares the metadata. using thread locals works well when work (eg a request) is guaranteed to stay on the same thread, but obviously systems like NIO require different handling to correctly pass it around as it moves work on/off threads. I am not familiar with dispatch enough to know if an MDC like solution could be made to work with it, so would be interesting to better understand that aspect from someone who is more familiar with it. if they can't play nice together, we can consider building this in stages, where we address the context aspect of work done through dispatch if/when the core team gives us additional extension points to make it work

3 Likes

IMO this is the ideal way to go.

There is a Java framework called Ratpack, that is based on Netty, and its API is not very dissimilar to Vapor in that it is future based, and makes no guarantees about a given request being completed on the same thread.

It does however have something called MDCInterceptor which allows you to use MDC as you always would: Documentation, Source code.

Works pretty well!

Perhaps DispatchSpecificKey<T> could be used for this.

Besides the obvious lack of context, this is a really nice and lightweight solution, and I came with the same idea in my implementation of LoggerManager factory:

  • Appenders allow plugging different implementations.
  • I also gather some file / lineno information.

Example:

LoggerManager.configure(appenders: [LogstashAppender()])
let logger = LoggerManager.getLogger(name: "test")
logger.debug("test")

I'm wondering about the hierarchical contexts, to have more structured logging, though.

Surely adding context: String? = nil to the logging API is not a huge burden to anyone. Well designed subsystems could arrange to accept and pass around such a context string. Users and library authors who can't be bothered could ignore it; those of us who actually need to debug our systems would probably use it.

GitHub - Wolg/awesome-swift: A curated list of awesome Swift frameworks, libraries and software. lists some more Swift logging frameworks that may be worth comparing

Logging level should be accompanied by an optional category, making it possible to select (for example) all dataloss level messages in the networking category, or all normal level messages in the user login category. If no category is provided for a given logging call, it matches all/no categories, depending on the query.

The defined levels and categories should be enums so that the abstract logging framework can be shaped to the specific application.

Sometimes the necessary logging makes your code harder to read.

Any ambitions of the Swift community to tackle the task with aspect oriented programming or some functional programming magic instead?

@neallester the challenge with making context an argument of the logging api, is that unless there is an implicit way to retrieve the context, such argument needs to also be added to most/all other functions of the program and libraries, since one would want to log from potentially anywhere in the program. as for ignoring context, when the context carries critical information like tracing (eg requestId) log entries become useless without it: you are left to do timestamp matching which is non-deterministic in a system that does concurrency, such as a server. i take your point that in a small, well contained system, passing context around between functions is achievable, but imo at an ecosystem scale this will prove to be "too hard" and not pragmatic

1 Like

Omitting the argument from the logging API doesn’t make it easier.