Logging

IanPartridge · September 10, 2018, 3:14pm

One area the Swift Server working group may look to improve is server-side logging. Currently, there is no standard logging solution which works across frameworks and libraries.

We are interested to collect ideas and requirements people have, especially where they are informed by experience in other language ecosystems, or experience deploying server-side Swift applications in production today.

Once there is a clearer picture of what we need, we could collaborate on developing a solution through the working group.

To kick the discussion off, I'll reply to this message with some initials thoughts of mine.

IanPartridge · September 10, 2018, 3:25pm

Different applications are likely to want different types of concrete logging e.g. to stdout, to a file (which may optionally rotate), over the network, to logstash etc.

Building one logging package to rule them all seems impractical.

More useful may be a logging abstraction which concrete loggers can be plugged into.

Wishlist:

Loggers could/should be pluggable at compile and/or runtime.
Log level could/should be changeable at compile and/or runtime.
Logging should be lightweight at the logsite with no requirement for libraries to maintain a separate "log message registry".
Logging should interoperate well with existing Swift functionality such as ExpressibleByStringLiteral and Codable.
Minimal performance overhead - preferably zero in the case logging is disabled.
The obvious support for different log levels, #file and #line information, etc.

I am aware of the following active projects in the Swift community - it would be great to add to this list so we can survey the current situation:

Michael_Ilseman · September 10, 2018, 4:46pm

BTW, improving the language support for logging is one of the purposes of SE-0228, which begins review today, though that does not prescribe a specific library presentation.

CC @ravikandhadai and @beccadax, who have been thinking a lot about how to surface logging on Darwin platforms. Obviously we'd want a unified approach across platforms; basically do to logging what Random did to arc4_random.

johannesweiss · September 10, 2018, 4:47pm

Thanks for raising that question @IanPartridge. I had a brief look into the 4 projects you mentioned. If I see this correctly, all packages allow the concrete logger type to be configured by the virtue of having either a protocol Logger with multiple implementations or having a log registry.

For changing the loggers/log level these four libraries seem to be in two totally different camps:

global logger: IBM-Swift's LoggerAPI and SwiftyBeaver seem to use a global log configuration.
local logger: Vapor's Logging and Perfect's Log require you to manually pass around the Logger existential to where you want to log through some mechanism (for example passing it as a parameter)

So I think we need to settle on which model we want.

Below I'll just brain dump my thoughts about those models:

the two models

I totally made up the terminology local logger and global logger, if there's some better and well-known word please let me know.

EDIT: Thanks to feedback of @Ponyboy47 I want to add that it's technically feasible to use a hybrid approach too.

global logger

I define something to be a global logger if the logging configuration is accessible globally in the program (usually through class/static logging functions). The logging configuration is at least 1) the concrete logging mechanism (are we logging to file, stdout/err, database, whatever) 2) the log level.

Pros of the global logger:

it's easy, just Log.info("hello world") from anywhere in the program
(not really a pro but still) the decision whether to log or not can be made without a virtual call (vtable / protocol witness table) which is great but then we need a lock to protect the state (unless the log-level is constant at runtime) which is much worse than a vtable call.

Cons of the global logger:

changing the log level for only a certain subset of requests seems hard/impossible as the way to change the log level is to globally modify the logger
having some payload (for example a request UUID) that gets always logged (and is distinct per request) is hard/impossible
synchronisation (ie. locks) are needed whenever something is attempted to log, even if nothing is logged, we will need to acquire a lock for the state of the logger (at least to get concrete logger & the log level). Interestingly, neither IBM's LoggerAPI nor SwiftyBeaver seem to actually synchronise the memory access if I see this correctly. That means that you will trigger undefined behaviour if your program logs from more than one thread and you do any log configuration changes at runtime after spawning any thread which is doing any logging whatsoever. That means in reality both SwiftyBeaver and LoggerAPI do not allow to change any logging configuration at runtime. In LoggerAPI's case we have public static var logger: Logger? which is unsynchronised and in SwiftyBeaver's case it's public private(set) static var destinations = Set<BaseDestination>()
EDIT Thanks to feedback of @neallester I want to add that testing the logging system using a global logger is problematic, with parallel test runs pretty much impossible.

local loggers

Local loggers store the logging configuration locally, for example through a parameter that is the Logger.

Pros of the local loggers:

synchronisation is not an issue because the logger object itself holds the state
trivial to implement things that have a different log-level per request
very easy to implement some extra state that gets always logged (such as a request ID)
EDIT Thanks to feedback of @neallester I want to add that testing the logging system using local loggers is very straightforward, even with multiple tests run in parallel.

Cons of the local loggers:

some very invasive mechanism to pass the actual logger is needed. Some well-known mechanisms are for example 1) passing the logger as an explicit function parameter to any function that wants to log or calls a function that wants to log 2) putting the logger in a 'context' and passing the context around explicitly 3) storing the logger in thread-locals. In Swift I can personally only see (1) and (2) really working (happy to expand on why) which means handing either the logger or some form of context which contains the logger through to every single place that wants to log

Performance Considerations

Ideally, we'd want logging to be completely free at runtime if there are no log messages issued. Short of having a constant logging configuration (ie. can't be changed at run time) completely free is pretty much impossible. Apple's os_log has pretty static configuration and uses pretty amazing tricks to reduce the overhead to pretty much zero. But it seems that for the server-world we need to be more dynamic and need to offer more features so we just won't get run time free I believe.

So if not free, we need to figure out how much overhead is fine? A long time ago I asked on swift-dev what people think about 'Zero-cost "Service Provider Interface"/Signature Packages' which is sort of relevant here. Today I think maybe some overhead is okay and I wanted to list out what overhead logging solutions that I have seen in Swift do occur at runtime so we can decide what's acceptable:

synchronisation overhead (usually for global loggers only)
reference counting overhead
protocol existential dispatch overhead (through virtual calls)

And I think people usually want to degrees of freedom at runtime:

a) changing the logging mechanism (file, stdout/err, database)
b) changing the logging level at runtime

My personal opinion (today) is that probably incurring reference counting (2) and existential dispatch (3) overhead is fine but synchronisation (1) probably is not. The benefit is that we'll get full freedom at runtime.

disclaimer

Please let me know if I misread some of the code. This was quite literally a 10 min analysis of the four loggers that @IanPartridge mentioned and I thought I should write down my initial thoughts. Also I obviously haven't touched on the API itself, like how many log levels, naming etc. And I also realise that there's a lot of stuff in my post that might not be clear enough. Please ask if anything's unclear I'm very happy to expand on anything.

Joe_Groff · September 10, 2018, 4:58pm

Apple's client platforms get a lot of benefit from the low overhead of os_log, which allows apps to collect a large volume of runtime info without much overhead. There are similar mechanisms on other systems like journald or Windows' service logs. Independent of any platform-specific logging runtime, maybe we could get the same freedom of implementation by providing an efficient immediate logging mechanism, which can be a low-overhead, lazily-rendered mechanism in the vein of os_log, and a flexible protocol for interpreting the log buffer, since the rendering or storage of log records can happen lazily and asynchronously with the rest of the program. That may require standardizing some aspects of the system (like the format and set of types supported by the logging buffer and string renderer), but could still hopefully give you the flexibility to feed rendered messages into your medium of choice, and change that on the fly without too much overhead in the code emitting the logs.

johannesweiss · September 10, 2018, 5:21pm

Thanks @Joe_Groff and @Michael_Ilseman for your input about os_log. I agree it would be great to have a system like os_log but unfortunately on the server that would probably complicate deployment because to have a lazily rendered system we would need at least some form of bespoke log format. There's lots of tooling (like for example splunk) that will send the logs elsewhere so they will need to be already rendered anyway. A lot of other tooling just expects logging to always arrive on stdout which is even required for 12factor apps.

What we could obviously do is to spawn a separate 'logger daemon' process/thread and firehose the data lazily over there which is what I believe os_log does. For a separate process we could use localhost UDP packets or some form of mapped memory for that which would be pretty cool but again it would complicate the deployment story quite a bit. If we decide to do it in-process to a dedicated logging thread there are plenty of options. Again, the benefit is unclear to me as you usually ship the logs to standard tools in standard ways and that usually requires textual rendering of every message (different people use different tools so it's not enough to find one system that supports optimised, non-textual data).

Joe_Groff · September 10, 2018, 5:27pm

My thinking was that we could provide something that just renders to stdout as fast as it can as an off-the-shelf implementation. It seems to me like even in that simple case, you don't want the log-emitting code to necessarily block on rendering strings and synchronizing in order to write to stdout, you'd still benefit from having that happen concurrently. Even in the Swift compiler itself, we struggle with the synchronization issues of trying to emit coherent diagnostics from parallel subtasks to one stderr stream. if we could provide the infrastructure to do that as part of the standard server environment, it'd help with the deployment issues.

Ponyboy47 · September 10, 2018, 5:32pm

I don't see why we couldn't/shouldn't do both. Both have their pros and cons so why not let the developer decide which is best for them? Create a Logger protocol that has basic requirements for the actual writing of a log entry to some destination (File, Network, stdout/stderr, etc). Maybe give it a context property which can be used for writing those special extra states (like a request ID). Be sure to make a .default or .global attribute/function that just grabs the root logger.

I think a large set of needs can be met with a global logger and this would probably be the logger used by most projects. However, a generic Logging protocol would allow developers to create their own special abstractions for the various other use cases that we know are out there, but can't possibly account for.

All that swift has to expose in the stdlib is the Logger protocol with some useful default implementations and a DefaultLogger or whatever, then the community can do their own work for their own special needs.

I like @Joe_Groff's idea for an off-the-shelf implementation that just writes to stdout (or whatever file handle is its destination) as quickly as possible.

johannesweiss · September 10, 2018, 5:36pm

Ok, that makes sense. So you'd think a fast firehose to some 'logger daemon thread' which then renders and prints to stdout, right? And the logging thread would only blocked if the firehose's buffer is full and the logger can't keep up, correct? Because we will need back-pressure here.

I can totally see the benefit of having the writing to stdout be done from another thread with a buffer in between. What's less clear to me is what the benefit of the lazy rendering is. Why wouldn't I just eagerly render it and put the rendered text in the firehose given that we always want to write the string?

If I understand your suggested model correctly we would select the concrete log delivery mechanism (stdout, file, db) in that logger daemon thread? Or would you also want to move the log-level decision into that thread (in which case I can see the benefit of the lazy rendering on the server-side too).

johannesweiss · September 10, 2018, 5:42pm

Very good question. My feeling is that programmers are lazy and would always go for the 'global' option which means your log stream is incomplete if you use the 'local' option. What if you carefully configure your local logger and then you want to hand it on to another library. That library might be written to only ever use the global logger which in the end means you'll lose the logs of that library.

Again, this is just my feeling but I encourage everybody to comment here on what they think about the hybrid approach.

EDIT: Often I see people using the request ID as a filter amongst lots of requests. In other words if a log message doesn't carry a request ID it might just as well not have been issued in the first place. With the hybrid approach I'm pretty sure we won't have a reliable mechanism that will carry the request ID around.

Joe_Groff · September 10, 2018, 5:44pm

Yeah, that's what I had in mind; the "logger daemon thread" could be the pluggable part of the system, based on a common implementation of fast "firehose" buffering of log parameters. The benefits of lazy rendering are to minimize latency on the log emitter's side, to avoid spending time rendering messages that are never asked for, and to allow for out-of-process rendering; this is important for os_log, where the renderer is a separate system daemon, no rendering happens if you never open the console, and certain log levels are ephemeral and discarded immediately if no renderers are observing them, but maybe that's all less relevant for the common server case. If we were going to attempt One True Logging Mechanism that spans all of these use cases, lazy rendering doesn't seem actively harmful, though.

Backpressure on the logging buffer is an interesting question; I believe os_log just ring-buffers lower-priority messages and lets them get overwritten if the buffer fills up.

neallester · September 10, 2018, 5:44pm

I’ve worked with global logging architectures which were hard to use in the context of tests. Log entries are system outputs which should be tested. Whatever design is selected should support verification of logging output within tests.

Ponyboy47 · September 10, 2018, 5:46pm

You're probably right, we are a lazy bunch.

What if we took an approach similar to python and used a global logger with log handlers? Then when you create a 'local' logger you're really creating just a 'local handler' and in order for a handler to flush its write operations to the 'logger daemon thread' it must be added to the 'global' loggers' 'handlers'?

johannesweiss · September 10, 2018, 5:53pm

@Joe_Groff thanks! Ok, I can see the benefits that 'lazy rendering doesn't hurt' my only counter argument is that it adds complexity. The systems I know of send the logs to other machines and then you analyse it later with kibana/splunk etc so lazily rendering them wouldn't work.

I do like the ring buffer however but I feel that many folks rely on tracing what a program did by looking at log messages that were/were not issued. That unfortunately doesn't work with a ring buffer and I'm not sure if that's obvious. Obviously, blocking a thread is bad (especially when using SwiftNIO) and we'll need to decide if we'd rather drop messages or block. I'm unable to say what is better and would suggest looking at prior art in the server space. My feeling is to aim to not block (by giving the firehose a 'big enough' buffer) but if we're logging too much there is something said for blocking the issuing thread as that'll show up in profiles of the application.

johannesweiss · September 10, 2018, 5:55pm

Sounds interesting, unfortunately I'm not familiar with what Python does here. Would you mind posting a short example of how we could only locally reconfigure a logger. For example attaching a request UUID or changing the log level for some local scope (say one request) only?

Helge_Hess1 · September 10, 2018, 6:02pm

I know little about all this, but I would guess that there might be at least a chance that it is actually faster to ship a small (80byte?) string to a thread than shipping the actual structured data (and all the CoW which is potentially required here). (I'm sure you measured that for os_log, but who knows ;-) )

Joe_Groff · September 10, 2018, 6:06pm

In os_log's case, lazy rendering support is confined to primitive scalar types and constant strings. Dynamically-generated strings have to be buffered, and objects rendered with %@ formatters use their dynamically-generated -description string, so the performance guarantees are weaker using those. The interpolation still happens lazily, though. I agree the benefits become marginal in that general case.

Helge_Hess1 · September 10, 2018, 6:11pm

Maybe I'm talking utter non-sense here, but what I'm thinking of is Apache style logging. In that the logger gets access to the request structure and the user/admin can select/configure what he wants to get logged.
That "%(remote-ip)s %(method)s %(uri)s" kind of thing.

But shipping a complex structure (across threads) can be pretty expensive in itself.

I know some people who use Splunk ;-), but I'm actually not sure how that works. Does it get structured data (even if only as JSON?). Or does it always just parse text?

Ponyboy47 · September 10, 2018, 10:20pm

I wrote up a gist in swift (~200 lines) along with just a couple unit tests to demonstrate it's usage...the tests don't actually do any validation, but they do run successfully to demonstrate how the logger protocol and the accompanying GlobalLogger could be used.

This wasn't a very well thought out example or anything, I'm mostly just trying to prove the concept that there can be a global logger and separate loggers and they won't collide with each other. I apologize in advance if you find it to be poorly written and not very well documented. I just whipped this up in a half hour or so and just wanted it to compile/run.

With something like this, the stdlib could include a few useful LogDestinations (I only wrote a very basic FileLogDestination) that developers could easily take advantage of. With the public LogDestination protocol, 3rd party devs could create all kinds of types that people could easily use without having to write their own Logger types. The LogContext could be extended to easily support date/time formatting, including process id's, etc.

EDIT: It may be useful to know that I tested this on Linux (Ubuntu 18.04) using the swift 4.2 Convergence snapshot

johannesweiss · September 10, 2018, 10:57pm

Thanks @Ponyboy47 for the example, that shows your proposal well!

The only issue I have with this is that it's really the hybrid model where we have both local and global loggers. This hybrid I think will lead to everybody using the global one (because people are lazy) which makes the local loggers useless (because why would I configure them if not every library is using them).

For example let's assume we configure new local logger (webLogger in your example) in the application itself. If we now call a library function that just uses logger (the global logger) then that library won't have the request UUID attached so those log lines are essentially lost if you filter by request UUID.