Ladybird Browser - Event Loop integration with Swift Concurrency

This is a follow-up to my previous topic on Ladybird.

One of the most exciting prospects of using Swift to implement new browser features is the guarantees and patterns that structured concurrency give us. However, in order for that to work, we need to inform the Swift concurrency runtime and compile-time checks how to interact with our existing application runtime.

Ladybird's Event Loop

The event loop library used by each process in Ladybird is named Core::EventLoop. It's part of LibCore, the OS abstraction library inherited from SerenityOS. The key principles of the API for LibCore's event loop and related classes take large inspiration from Qt. The API for Core::Timer maps almost 1-1 with QTimer, for example. Just with snake_case instead of camelCase variable names.

There are a few architectural choices for Core::EventLoop that make it an interesting challenge to integrate with Swift 6's concurrency APIs. The most obvious difference from how Swift concurrency sees the world that there's no such thing as the "main event loop", or even the "main thread event loop". With LibCore, no thread is treated specially by the library. Higher level libraries or application logic can of course enforce such a thing, but LibCore doesn't care. So mapping a specific LibCore event loop to @MainActor, or creating a @globalActor from one doesn't make much sense.

To not get too long-winded, here's a list of interesting properties of LibCore EventLoops:

  • EventLoop internal state is per-os-thread
  • EventLoops are expected to live on the stack of each thread
  • EventLoops can be directly referenced cross-thread to post messages to another thread
  • EventLoops are linked in (morally) a per-thread intrusive list. Nested event loops are used to achieve things like modal dialogs
  • Events are processed per-thread by the top-most EventLoop on that thread's stack

LibCore EventLoops are heavily reliant on thread-local state, and their identity is tied to the OS thread they live on. For example, if you post a message to another event loop, you can be guaranteed that the message handler will be called from thread when the event loop gets around to processing its thread-local message queue. Many objects in LibCore also implicitly reference the "current event loop", which is defined as the top-most event loop in the event loop stack for the current thread. While messages will be posted to the thread-local message queue, the actual event loop object referenced will be the top-most. This is mostly useful for modal dialogs in SerenityOS's LibGUI, but the fact remains that you can add another event loop on the OS stack and call exec() on it, and it will remain the topmost event loop for that thread until application code calls quit() on it. The most obvious implication of this per-thread behavior comes with file descriptor notifiers, called Core::Notifier. If you register a file descriptor or other OS handle with an event loop, it will be registered with the top-most event loop of the calling thread.

This thread-statefullness is handy from an application writer's perspective, as long as they are writing an application with one or two threads. For example, a GUI thread, and a worker thread. Or a GUI thread and an IPC message handler thread. Concurrency issues can be sidestepped by posting work to another thread's event loop directly, and the cost is simply that the work will be deferred until the thread is done handling its current file descriptor notification or other application event.

However, it causes issues when interacting with other libraries and runtimes that go create threads on their own. For example, with the default behavior of Swift concurrency :). It's my understanding that on macOS, Tasks simply defer work to an internal pool of dispatch queues, and dispatch queues themselves have an internal thread pool, at least in user space applications. On Linux, I believe that the implementation is backed by some form of libdispatch and pthreads, though I've spent less time looking at stack traces from failed assumptions on that platform..

The fact is, LibCore expects that your main looks something like this:

int main() {
  // process arguments
  Core::EventLoop ev; // or other Application class that has-a EventLoop
  // set up IPC and fd callbacks
  return ev.exec();
}

LibCore also expects that any worker threads you create also have an EventLoop at the top of your thread function. It's not exactly an error to heap-allocate an EventLoop, but it's definitely not the way the library was designed to be used. EventLoop does important clean-up in its destructor for the thread-local state, so having one persist longer than its thread's stack is quite a foot-gun.

Built on top of LibCore: Ladybird's many daemons

While it might be obvious how a UI process would use an EventLoop, the fact is that the core application logic for Ladybird's WebContent process, RequestServer process and ImageDecoder processes also use Core::EventLoop. In terms of SerenityOS patterns, each of these processes are "IPC Servers". Their core functionality is to block on several file descriptors that in Unix-land correspond to Unix domain sockets and wait for requests from their client(s). This holds true for RequestServer (the networking process) and ImageDecoder (a sandboxed process to parse untrusted raw bytes and hand back bitmaps).

WebContent, on the other hand, is the workhorse of the entire browser. Each tab in the UI gets its own WebContent process to back its navigation. It acts as the "renderer" process, in more generic browser terms. That's where the Web engine LibWeb, JS engine LibJS and WebAssembly implementation LibWasm parse markup and scripts from the web and transform them into display lists, textures and bitmaps to render in the body of each browser tab's UI. An important architectural artifact of LibWeb to know and understand though, is that it is entirely single-threaded and thread-hostile. Trying to execute javascript and parse HTML for two concurrent browser tabs inside the same process would only end in disaster. Thankfully, processes are cheap :slight_smile: (... if you pretend mobile doesn't exist).

The HTML EventLoop

To meet the needs of the HTML specification, LibWeb includes an abstraction on top of the Core::EventLoop called Web::HTML::EventLoop. This is our implementation backing the event loop concept from the specification.

The HTML event loop as described in the specification is quite abstract. It is supposed to be able to process 'tasks' that are submitted to 'task queues'. You're supposed to be able to 'spin' the event loop until a certain condition is met. In practice this means processing as many submitted tasks as possible on any of the queues until your goal condition. However, the way that the event loop is specified makes assumptions about the ability to suspend tasks that are not possible with Ladybird's implementation without putting more and more tasks on the native OS stack. The reality is, HTML::EventLoop works by having a zero-timeout timer attached to the Core::EventLoop for the WebContent process. Whenever work is submitted to the HTML Event Loop, we poke that timer to make it expire immediately. That means that IPC messages, which are based on file descriptor notifications, and HTML events are serviced in concert on the same "main thread event loop". While LibCore does not have a concept of a main thread event loop, LibWeb sure does. The specification says that each "agent" should have its own event loop, and makes no restriction on how many agents can live in a process. In fact, it's careful to avoid making any restrictions on thread or process models to allow for implementation diversity and flexibility. In practice, Ladybird limits itself to one agent per process due to thread-safety concerns in the implementation.

The specification also has a concept of performing algorithms and work "in parallel". This "in parallel" concept in theory means that it executes in some abstract wonderland that will eventually submit work back to the HTML event loop by calling one of the "Queue task" algorithms. In practice, work that is supposed to be "in parallel" is always simply deferred to a separate "deferred work" event on the LibCore event loop. This work is processed by the LibCore event loop separately from the HTML event loop and IPC notifications. In practice, there's no need to do this work on the same thread that is processing HTML events, events from the UI (input events) or executing javascript.

Enter Swift

Wouldn't it be nice to use some language abstractions to defer work to "somewhere else", while having the right guard rails to make sure that the results can be submitted back to the HTML Event Loop in a safe manner? In theory, swift concurrency should be able to help with this. We just need to model things correctly!

After scouring swift forums threads and documentation, I discovered few things:

  • It should be possible to model our LibCore event loop as an "executor"
  • Actors are things that exist, and use executors in some way
  • The lifetimes, responsibilities, and expected behaviors of actors are a complete mystery to me
  • Tasks are ways to group units of work, but they make absolutely no guarantees about when, where or how that work is executed
  • You can limit the "where" by tying a task to an executor, but the "when" is still a big shrug, along with "in what order"
  • Swift concurrency and global state are somewhat friends, as long as "global state" means "main thread state".
  • Swift concurrency and threads are not friends, and the default backends of things like Tasks assume that the concurrency runtime should be able to execute work on whatever OS threads it wants

Going off of these discoveries, I created a pull request for Ladybird here that adds an executor and actor protocol for Core::EventLoop.

Can "in parallel" mean "in Swift"?

There's a lot more places to go with this exploration though. One thing I'd really like to do is to be able to model the HTML specification's "in parallel" execution model for deferred work via swift concurrency tasks. If specification authors are following all the rules, then the only way for the results of this work to become visible to "author code" (javascript) is for there to be a call at the end of the method to "queue a task" or "queue a global task" or "queue an element task" onto the HTML event loop with the result. There is one glaring issue though: memory safety.

As I explained in my first post and comments thereof, nearly everything in LibWeb is garbage collected on the JavaScript heap of LibJS. Though since that post we have moved the GC into its own library we call LibGC (original, I know). This does pose a problem for how to defer work onto async tasks, however. LibGC's garbage collector is quite naive. It is non-moving, non-compacting, and non-generational. On top of that, it relies on a conservative stack scan to perform its mark-and-sweep of possible heap pointers during collection.

If we wish to operate on heap-allocated objects from inside a swift task, things get complicated. If a collection is triggered from a thread that is not the one the heap is created on, everything explodes. The GC::Heap class assumes that the application is single-threaded, and saves the stack bounds of the thread it was created on for its mark-and-sweep pass during periodic collection. In order to allocate any GC object from a swift task, we must first update the GC heap to be smarter about how it scans thread stacks. It's more exciting than that though, we would need to update the entire allocator to be thread-safe. That or, we would need to force swift tasks to still defer work to the main thread event loop in LibWeb, and sidestep the thread issue until later :tm: .

But why do we need to allocate things from the "in parallel" tasks? And why do we need to refer to GC objects from this separate context? It's kind of simple: in order to "queue a task", we need to know for whom. That means we need to have a reference to the HTML Document or HTML Window object (or Worker object or ...) that we wish to post the task to. Another complication is that the queue task APIs all require a GC function. That's a special wrapper around our std::function-like class, AK::Function that knows how to scan the captures of whatever closure or function-like object we wrapped for GC pointers by inspecting the raw storage of the function. This lets us wrap lambdas that capture GC pointers by copy into GC functions, and ensures those objects will be marked as live during a mark and sweep pass. Of course, calling GC::create_function inside a Task will cause issues if there's threading involved.

What's next?

All that may or may not make sense if you're not intimately familiar with the Ladybird codebase, but the result is that there's still work to do. One day I'd like to be able to write idiomatic async swift code that transparently interacts with the HTML Event Loop, and the Ladybird garbage collector, but it's clear there's a lot of thinking left to do.

7 Likes

For discussion of "in parallel", this is the important context. Though it is a bit speccy and application-specific 8.1.7.5 Dealing with the event loop from other specifications

Hi Andrew!
Thanks for taking the chat to the forums :waving_hand:

It's really exciting to see you dig into the possibilities here. Thanks for all the background info, it helps paint a general picture of what you're trying to do.

Yeah I think you're on the right track that your event loops are good candidates to become Executors (Serial - those who back actors mutual exclusion guarantees, or TaskExecutors);

It does sound like perhaps taking over both the global and main executors would give you some way to stick all the work to one of your event loops. And others, you'd perhaps use tasks with task executor preference to execute a task and all its child tasks on a specific event loop maybe? You might be interested in the [Pitch 2] Custom Main and Global Executors proposal which would allow you to effectively disable (or side step) any of Swift's existing executors, and therefore not have any threads that are not managed by your runtime.

Then the actual modeling of your queues and sending between them I'm not entirely sure... but as far as threading and isolation goes in general actor isolation so the "actor owns some data and only Sendable conforming types can be sent to other actors (isolation domains)" may be what may be helpful to you.

We discussed your PoC executor on chat, it seems like a good first step! Let us know when you hit maybe some specific issues and we can try to figure out how to model them in Swift then :thinking:

1 Like