Ladybird GC and imported class hierarchies

Looking at Brainstorming how to import base classes for C++ interop - #17 by zoecarver, it seems like there is no general solution for importing C++ class hierarchies. Though if there's a more recent forum post, please let me know :)

At any rate, here's the problem I want to solve. It's related to the GC integration I prototyped in this pull request as discussed in Ladybird Browser and Swift - Garbage Collection.

TL;DR: I have a ladybird PR and swift issue that could use some feedback.

Ladybird's LibGC: Overview

In Ladybird's LibGC, it's expected that every GC-allocated type derives from GC::Cell. The cell class has a number of important virtual functions, including void visit_edges(Cell::Visitor&), void finalize(), and a couple important boolean bitfields for state(alive, dead) and marked/unmarked.

The visit_edges function is the most important one, though finalize() is useful if you need to interact with virtual functions before object destruction. LibGC's garbage collection is implemented with a conservative stack-scan and a registered root-based mark and sweep. That is to say, during a periodic GC scan for dead cells, first we scan the stack, and then we scan known strong roots for live cells. This is done via an implementation of GC::Cell::Visitor which goes through and marks every visited cell. After that, every unvisited cell is destroyed/deallocated, and cells are unmarked. The GC is therefore architecturally a stop-the-world, non-moving, non-generational, single-threaded GC.

Every GC-allocated type must implement a visit_edges() overload that visits any member variables that are themselves GC-allocated. For example:

#include <B.h> // B is also a GC::Cell

class A : public GC::Cell {
   GC_CELL(A, GC::Cell);
public:
   static GC::Ref<A> create(GC::Heap& heap, GC::Ref<B> b) {
      return heap.allocate<A>(b);
   }
private:
  A(GC::Ref<B> b) : m_b(b) {}

  void visit_edges(GC::Cell::Visitor& visitor) override
  {
    Base::visit_edges(visitor);
    visitor.visit(m_b);
  }

  GC::Ref<B> m_b;
};

The GC::Cell macro defines the second argument as a using Base = ... alias, and makes GC::Heap a friend, as well as creating an override for GC::Cell's name() virtual function. GC::Ref and GC::Ptr are template class helpers that are nothing more than sugar for plugins and clang-tidy checks to analyze for errors. GC::Ref also asserts that the value it holds is never null.

Ladybird's CI runs a clang plugin to verify a few properties about GC-allocated types. For example, if a GC-allocated class has any GC::Ref or GC::Ptr member variables that are not visited in visit_edges, the plugin generates an error diagnostic. If the visit_edges function does not call Base::visit_edges(visitor) as the first statement, it also generates an error diagnostic.

GC::Cell::Visitor has a lot of handy function overloads to visit() to make it the least amount of work possible for the programmer to visit everything properly. There's an overload for Cell*, Cell const*, Cell&, Cell const&, Ptr<T>, Ref<T>, Vector<Ref<T>>, HashTable<Ptr<T>>, HashSet<T, Ref<U>>, ... etc. In the end, all the helpful overloads end up calling the Cell pointer or reference overloads 1 to N times.

Unless I'm mistaken, these GC-allocated types must be modeled as "look but don't touch" reference types in the Swift type system. Their lifetimes are limited to less than the program's lifetime, but externally managed. If pointers or references to GC-allocated types are stored anywhere other than the native call stack by swift code, that is memory unsafe and the value may be deleted underneath the Swift compiler's nose. That is to say, if they are stored anywhere other than the native call stack and not marked as a strong root or not visited by some entity eventually visited by a strong root during a GC root.

Limitations of HeapAllocatable protocol from before

As implemented in the original pull request and discussed in Ladybird Browser and Swift - Garbage Collection, the solution for creating GC-allocated swift types includes a protocol called HeapAllocatable. As part of this protocol, the swift object will include a GC::Cell pointer that ultimately points to a GC::ForeignCell object.

Visiting objects that implement HeapAllocatable is straightforward. One must only add an extension to GC::Cell::Visitor that accepts any HeapAllocatable type, accesses its cell getter, and calls visit on that. C++ code that wants to visit a HeapAllocatable type can do so through the ForeignPtr<T> or ForeignCell<T> interfaces easily.

However, a far more interesting use case is when a Swift HeapAllocatable wants to reference a GC-allocated C++ type as a member variable. In LibWeb, this will likely often be a reference to a JS::Realm, JS::Value, Web::WebIDL::Promise, Web::HTML::Document, Web::HTML::Window, Web::DOM::Node, or any other arbitrary web platform objects. When delegating work to a swift object, we must have a way to return the work results back to the caller. In the general case that will need to be done by calling Web::HTML::queue_global_task(HTML::Task::Source, JS::Object& global, GC::Ref<GC::Function<void()>> steps) to enqueue js-visible work onto the HTML Event Loop (as described in my previous post on the event loop in Ladybird).

This is where we get to the major limitation: When an imported C++ type is-a GC::Cell or a JS::Cell (which itself is-a GC::Cell), Swift knows nothing about it. Therefore trying to do something like this is impossible:

public final class SpeculativeHTMLParser: HeapAllocatable {
    var parser = Web.HTML.HTMLParserGCPtr()  // FIXME: Want HTMLParserGCRef here, but how to initialize it?

    public init(cell: GC.Cell) {
        self.cell = cell
    }
    public var cell: GC.Cell

    public static func create(on heap: GC.Heap, `for` parser: Web.HTML.HTMLParserGCPtr) -> GC.Cell {
        precondition(heap.is_gc_deferred())
        let _self = allocate(on: heap)
        _self.pointee.parser = parser
        return _self.pointee.cell
    }

    public func visitEdges(_ visitor: GC.Cell.Visitor) {
        visitor.visit(parser.ptr()) // compile error: Web.HTML.HTMLParser does not model is-a GC.Cell
    }
}

It's impossible for this class to visit its parser without telling the type system "not to worry, I know what I'm doing". My first attempt at that looks something like this:

extension GC.Cell.Visitor {
    public func visitUnsafe<T>(_ hopefullyCell: T) {
        visit(unsafeBitCast(hopefullyCell, to: UnsafeMutableRawPointer.self).assumingMemoryBound(to: GC.Cell.self).pointee)
    }
}

or

extension GC.Cell.Visitor {
  public func visitUnsafe<T>(_ hopefullyCell: T) {
    withUnsafePointer(to: hopefullyCell) {
      $0.withMemoryRebound(to: GC.Cell.self, capacity: 1) {
        visit($0.pointee)
      }
    }
  }
}

But both of these crash a main +assertions build in IRGen. So I was stuck casing the pointer to void* and doing a the dangerous cast in C++ from void* to Cell* in order to visit.

Conclusion

I have a PR to ladybird open here: LibWeb+LibGC: Teach Swift about GC::Cell, and start writing a Swift class in LibWeb that is gc-allocated by ADKaster · Pull Request #4053 · LadybirdBrowser/ladybird · GitHub
and a swift issue open here, about the inheritance issue: [cxx-interop] Imported class heirarchy of reference types unable to (implicitly) upcast to base · Issue #80231 · swiftlang/swift · GitHub

How can I get my preferred visitor.visit(parser.ptr() (or even, visitor.visit(parser), without too much boilerplate) to compile?

How can I get main to not crash on my visitUnsafe?

How can I convince Swift to synthesize types that alias of GC::Ref<T> or GC::Ptr<T> for any GC::Cell type? In theory a perfect world's macro could do this but I think the type would need to exist in the C++ header file right?

Is there another pattern I could use to let HeapAllocatable types own C++ types derived from GC::Cell? Could I invent a protocol and add a manual protocol conformance into my GC_CELL macro? Or can manual protocol conformance only be done outside the class body?

Thanks for coming to my Ted Talk. I'd like to cc @Douglas_Gregor as his comments on my initial GC work were very helpful.

10 Likes