I am thinking about the best way logging should be implemented, especially in (potentially) asynchronous contexts. Of course, we already have swift-log, and logging in Vapor is – I think – based on that, where you do not need to use await when logging. Of course, parallel access to loggers has to be organized e.g. when logging into a file, in those cases DispatchQueue is used see e.g. this implementation.
The use of DispatchQueue in those cases prevents having to work in an async context. On the contrary, if you define a logger protocol having async methods (and the implementations of a logger being actors), you need to also use async functions wherever you do logging.
What do you think, what is "conceptionally" better, and what about performance? There is some discussion about performance of DispatchQueue vs. aync/await, but maybe the main point is that logging with actors leads to ubiquitous async functions, and the question is of course, how much logging is done in comparison to all the other work, logging should be seldom in comparison to everything else. From other discussions I have the impression that prevalent use of await calls is not a good thing, you would like to give a hint with this annotation that there is something special going on (work might be suspended), and to use this annotation just because you might log something seems to invalidate this use of await.
On the other hand, when implementing loggers as actors parallel access in "naturally" organized, i.e. you do not need to implement this yourself, so the use of actors might be a good choice here. And with DispatchQueue you also have (another kind of) suspended work, you are not just explicit about it.
Async/await is viral, so I wouldn't make logging async. "DispatchQueue.someSerialQueue.async" is one solution (simple but not very efficient), a more efficient solution would be based on a ring buffer - "log" fills that ring buffer atomically (it is possible to do this wait-free if needed, or just mutex protect it) and a separate thread reads from that ring buffer and writes to a log file.
await suspends the current task, waits for the asynchronous operation to finish, and then resumes the task.
When you're logging something, it's because some event already happened, and the act of logging does not produce any meaningful result which would affect the remainder of the task. Therefore, suspending and waiting for the log operation to complete is unnecessary, and a "fire and forget"/detached operation would be more appropriate.
DispatchQueue.async is fine for that. Alternatively, you could use Task.detached { ... }
That's a good point - it might; I guess it depends on your requirements. You could just accumulate the messages as they come and sort them later (e.g. if they have a timestamp).
Thank you all for your very good answers. Especially @tera and @Karl: Very committed both, I always appreciate your comments.
Yes, logging is not something that should determine about contexts being async or not, because logging does not concern the actual processing so to speak. An await statement should express „this piece of work could be suspended, and for good reasons.“
Of course, formulating an actor to do logging would be very easy, when using another mechanisms it gets a little bit more complicated (if you have to serialize the writing of log entries in a concurrent context). So I have formulated a „general“ class that can be extended when formulating a specific logger. So the definition of a logger becomes very easy again. I am using DispatchQueue because that seemed to me the easiest way, but one could easily switch this implementation by just changing the general class.
Hmm. This is doing filtering etc., but does this solve the problem of having to serialize the writing of messages in a concurrent context?
Yes, using DispatchQueue might not be the most efficient possibility? If using mutex manually is much faster, I would consider it. Is completely wait-free (in a ring?) possible with an unkown number of log messages?
The question is if the actual logging of messages (writing them to a file e.g.) should be deferred that much, e.g. when an application crashes you might want to see quite recent log entries, and when having many log messages, they should not accumulate too much (but I think I am having this problem with my DispatchQueue, too).
In my experience, this is an over-generalization - logging at significant "checkpoints" in the application is a pretty common pattern, especially for things that could be flaky (network operations, etc.). In these cases, I'm specifically looking for the logging to complete before moving onto the next phase, so that if something dies I can quickly acquire context for what's going on.
For these kinds of log messages I explicitly want a "log barrier", and modeling this with a suspension point in an async context seems perfectly natural and very much in line with the blocking synchronous counterpart.
This also plays into @sspringer's point about crashing with log messages queued up, and using a suspension point neatly sidesteps a lot of the race conditions present in that model.
We use dispatchQueue.async { } to log and it works nicely on all versions of iOS and macOS from multiple threads/tasks/queues/whatever, all of the items are logged in order, and the time hit on the main thread is minimal.
The only time I don't like it is when tracking down a crash - the app can crash before the log code gets executed, but we can fix that by using a manual switch to use dispatchQueue.sync { } in those circumstances.
We also have the ability to get/view logs within the app and I think that lends itself more to async/await.
Are you talking about logging for apps or servers?
It seems like you're talking about debug logging, where you're building in debug mode, and are sitting in front of the machine with a debugger attached, waiting for something to succeed or fail. That's one particular use of logging, and I'm not sure it is what the question was about.
Since this question is tagged server, and says:
I understand this as referring to logging in release mode, where no debugger is attached, nobody is sitting in front of the server waiting for it to crash, and performance becomes an important consideration. In that case, yes, awaiting on every log statement will add significant latency to any operation you want additional insight about.
As the name suggests, await waits. It's not "blocking", because the thread can be reused by other, unrelated tasks, but otherwise it has similar characteristics to something like file/network IO - your task (the operation you're currently working on) is suspended, and an arbitrary amount of time passes before it gets resumed.
Doing that on every log call is potentially very expensive, meaning you're unlikely to be able to log as often, which defeats the purpose of this kind of logging. For logging deployed to production, you don't expect the application to crash, but if it does, you want as much information as possible about what led to that crash.
For what it's worth, I'm talking about servers in specific since that's primarily where I use Swift (AWS Lambdas in particular); however, the same concepts can apply to any application (like having a user submit logs with a bug report from an iOS app, for instance).
There are simply some classes of messages that it's not okay to "fire and forget" as the next operations in the program may cause a crash before the message gets fully logged - thus the need for a "log barrier" with await. Since you apparently haven't encountered any of these scenarios before, here are a couple off the top of my head that I have encountered:
Significant branches in program logic: knowing what macro-level branch you followed for a web request can help you quickly narrow in on a specific section of code for debugging when something in production is crashing
Start of requests with known-flaky external APIs: if you're interfacing with an external API (some particularly shoddy C libraries I've had to integrate with fall into this bucket), knowing that you invoked the external API (and maybe with what parameters you called it) can be immensely helpful to cut down on debugging time
We agree on this point entirely - in production, critical log messages (whatever that means to you and your app) should never be lost into the void because the program crashed before the log completed. I would rather pay a small performance penalty for this subset of messages than risk losing them.
I never said that await was blocking, only that awaiting is a nice parallel to a synchronous, blocking logging call and shares some characteristics with that pattern.
I am thinking about libraries that are used on the server and on clients alike, some of them doing longer work (batch processing), and with potentially parallel execution. So server is an important context.
I would think when using DispatchQueue.async there is also some cost (somehow you have to manage concurrent access), and there was some talking about async/await being potentially even more efficient (but nobody could really confirm this with test I think). (Update: Having to use „async/await“ all the way up through your chain of calls might make it more expensive?)
So I am with you here, logging should be cheap, if you have something cheaper than DispatchQueue.async, I would consider it (or is DispatchQueue.async fast enough)? And I think accepting that some log entries could be lost in the case of a crash could be OK for „normal“ logging (but see the idea at the end):
Yes, in many cases you are able to reproduce the error, I was also thinking about using such a switch.
Well then, maybe then use two different loggers, one being cheap (with potential loss of log entries in the case of a crash), and another more expensive (e.g. DispatchQueque.sync) that is always logging immediately, giving you enough information so you might be able to reproduce the crash (e.g. which item is being worked on). This is also expressed by @MPLewis in the last comment (added while I am typing my response):
„async“ could be OK on a top level of your logic and might be the only place where it is needed in many cases (info about the work item/request), but when you need a log barrier further down in your application logic (e.g. when calling something „dangerous“) then with „await“ you will have again many „awaits“ up to the top. That might be OK, I am not sure about that, but you could use something like group.wait() with DispatchQueue instead (or is DispatchQueue.sync enough – Update: See the documentation about DispatchQueue.sync, it is not clear to me ...)?
Just FYI, DispatchQueue.async isn't as lightweight as you'd think. In Alamofire we dispatch all of the calls to our EventMonitor protocol and it results in a 5% drop in total request throughput (Requests per second) just having a single monitor active. Any logging solution interested in having minimal impact likely wants to avoid that.
I have learned some things about low-overhead logging in C that may
also apply in Swift:
If you are logging strings with interpolated values—log("\(n) widgets processed in \(elapsed) nanoseconds")—and you have multiple
logging levels/modes/channels, then defer the interpolation until
after you decide whether or not to log the message based on the level,
mode, or channel state.
In C, I use a macro that tests the level, mode, or channel state and
with a branch-prediction hint (__builtin_expect for GCC/Clang)
that helps the compiler put the do-not-log case on the fast path.
Formatting and recording the log message happens on the slow path.
In Swift, I'm not sure if branch-prediction hints are available, but I
know that you can use an autoclosure to defer the string interpolation
until after the log condition is checked. BTW, it is probably best to
inline the test-and-branch.
To the greatest extent practical, let threads (or tasks) that run
independently, log independently. In C, that usually means recording
log messages to thread-local ring buffers. If you need permanent
logs, then you can dedicate a thread to emptying ring buffers to a file,
or you can empty buffers in the log(...) calls. Single-writer,
single-reader ring buffers are all that you need; writing a log
message to a buffer or emptying a buffer should virtually never
require acquiring lock.
In pure Swift, Task-local storage looks easy; thread-local storage may
suffice, but Swift may make that more difficult. Regardless, writing
a log message to a buffer should not require locking.
Sometimes one has to debug a program whose threads just have to be
"free running" (i.e., heavy log synchronization has to be avoided or
else the program performance drastically changes) and that one just knows is going to crash sometimes.
I suggest constructing your logging so that by examining a crash dump
using your favorite symbolic debugger, you can locate and extract all
of the task-/thread-local ring buffers and in that way reconstruct
every message logged prior to the crash. I don't know if there is
a good Swift library for examining core dumps (maybe there's a nice
DWARF library?), but if there is, it may not be too hard to automate
the reconstruction.
I'd do a global ring buffer - but the fear of losing last log entries just before crash is real as someone already noted up the thread. I'd only invest in my own logging scheme if existing built-in alternative proved to be inadequate (see below).
OSLog is specific for Apple platforms. I could not find any substantial information about it other than e.g. the judgement that „OSLog has a low-performance overhead“ which of course sounds promising. I would like to know how it is implemented, is the source available?
The problem (well, one problem) with OSLog is that you can't easily get the logs from user devices. We log to text files so users can press a button to send us their logs.