Pitch: A vision for COM Interoperability in Swift

Hello All,

One thing that has often been discussed is the idea of a bi-directional COM interop for Swift. I think that we are finally at a point where we can consider what such a deisgn would look like. I'd like to kick off a discussion to build a shared model for a vision for COM interop.

Thanks to @al45tair for early feedback on this as a starting point.

I am sharing the document here, but will subsequently create a PR for swift-evolution to add this as a vision that will eventually need approval from PSG to be accepted. The subsequent design components will straddle the boundaries between the PSG and LSG and often require input from both. The goal here is to draw upon knowledge from the community and ensure that the design and vision integrate the broader thoughts.


A Vision for COM Interoperability in Swift

Introduction

The Component Object Model (COM) is the foundational binary interface standard underlying the Windows platform. Every significant Windows API surface, from Win32 shell extensions to Direct3D, from Office automation to the Windows Runtime (WinRT), is defined in terms of COM interfaces. COM is also used cross-platform: Mozilla's XPCOM and Apple's IOKit both implement COM-compatible binary interfaces, and projects like MiniCOM demonstrate that the binary model is genuinely portable across Linux, macOS, Android, iOS, and WebAssembly.

Swift on Windows currently has no first-class story for COM. Developers who need to call COM APIs from Swift must drop to unsafe C pointer manipulation, hand-write vtable structs, and manage reference counts manually alongside Swift ARC. This is fragile, verbose, and error-prone. This vision describes how Swift can make COM interoperability as natural as its existing Objective-C interoperability.

The guiding principle is that COM is the binary interface; Swift is the language. Swift developers should never have to think about vtable layout, reference counting, interface identity, or memory allocation domains. They write idiomatic Swift — protocols, classes, properties, throws, async/await, for/in — and the compiler handles the COM binary interface automatically.

Goals

Toll-free bidirectional bridging

Swift code should be able to consume existing COM interfaces, implement COM interfaces, declare new COM interfaces, and expose them to other COM-speaking languages, all with the same type fidelity and performance as native C++ COM code. A COM interface should feel like a Swift protocol. A COM coclass should feel like a Swift class. Crossing the language boundary should cost nothing beyond what the COM ABI itself requires.

Safety without ceremony

The ARC bridge should be the sole owner of the reference count. Swift developers should never call AddRef or Release. QueryInterface should be expressed as Swift's as? operator. Memory allocation domains (CoTaskMem, BSTR, HSTRING) should be handled automatically by synthesised wrappers. Error handling should use throws, not manual HRESULT checks.

ABI compatibility

The object layout and vtable structure should be binary-compatible with C++ COM objects. A Swift-implemented COM object should be callable from C++, C#, or any other COM-speaking language without an adapter layer. The ABI should be lightweight enough to be toll-free in both directions.

Cross-platform

The core @COM attribute and ARC bridge should not be Windows-specific. They should work wherever Swift runs and COM-like interfaces exist: Windows COM, XPCOM, IOKit, MiniCOM. Platform-specific features (apartments, activation, registration) should be layered on top without changing the core model.

Incremental adoption

Existing C/C++ COM code imported via the Clang importer should be automatically elevated to idiomatic Swift without requiring annotation changes to the original headers. HRESULT-returning methods should become throws. Property accessor pairs should become Swift properties. Hungarian-notation parameters should be renamed. Counted arrays should become [T].

Layered design

The interoperability is structured in three layers, each building on the previous:

Layer 1: Core COM (compiler and runtime)

The first layer provides the language-level support for COM's binary interface. This is the work that requires compiler changes:

  • A @COM attribute on protocols (declaring COM interfaces with IIDs) and classes (declaring COM coclasses with optional CLSIDs).
  • Compiler-synthesised QueryInterface, AddRef, and Release with a unified ARC reference count (no separate COM refcount).
  • A compact object layout where COM vtable pointers precede the Swift object header, with a vtable[−1] adjustment constant for pointer recovery.
  • An ISwiftObject COM interface for recovering the Swift heap object from any COM interface pointer, enabling as? to work across the COM boundary.
  • Table-driven QueryInterface with a shared implementation in the COM module, so that the QueryInterface logic is not duplicated per class.
  • Shared AddRef/Release thunks in the COM module, generic across all @COM classes.
  • Clang importer integration: structural IUnknown detection, HRESULT-to-throws promotion, [retval] promotion, property synthesis, name translation, and MIDL array annotation handling.
  • A COM standard library module providing IUnknown, ISwiftObject, COMError, activation overlays, threading types, and extensions on SDK-imported types (GUID, IID, CLSID, HRESULT).

This layer is platform-independent. It works with Windows COM, XPCOM, IOKit, and MiniCOM.

Layer 2: Windows platform integration (compiler and library)

The second layer adds Windows-specific features that are meaningful only when targeting the Windows COM runtime:

  • Threading model declaration (@COM(CLSID:, ThreadingModel:)) and the COMThreadingModel enum.
  • COMExecutor and COMMultithreadedExecutor bridging COM apartments to Swift's concurrency model.
  • withCOMContext, COMContext, and @COMMain for COM initialisation lifecycle.
  • withActivationContext for scoped CLSCTX via @TaskLocal.
  • DllGetClassObject, IClassFactory, DllRegisterServer/DllUnregisterServer synthesis for @COM(CLSID:) classes.
  • @COMInit for designating the activation initialiser.
  • ISupportErrorInfo synthesis and IErrorInfo capture/population.
  • BSTR and LPWSTR string bridging with correct allocation domains.

Layer 3: WinRT projection (pure library)

The third layer projects the Windows Runtime onto Swift. This layer requires no compiler changes. It is a pure library built on the @COM infrastructure from Layer 1:

  • IInspectable synthesis (GetIids, GetRuntimeClassName, GetTrustLevel) implemented as library code reading compiler-emitted metadata.
  • Parameterised interface IID derivation via UUID v5 (SHA-1), using the WindowsRuntimeType protocol for type signatures. The concrete IID is computed from compile-time constant inputs and stored directly.
  • IAsyncOperation<T> and IAsyncAction bridged to Swift async/await via continuations.
  • WinRT events (add_/remove_ with EventRegistrationToken) bridged to AsyncSequence.
  • WinRT collections (IVector<T>, IMap<K,V>, IIterable<T>) bridged to Swift Sequence and Collection.
  • HSTRING bridging with the WindowsCreateStringReference fast-pass optimisation.
  • RoActivateInstance/RoGetActivationFactory overlays and the @WinRT(RuntimeName:) macro.

The fact that WinRT requires no compiler changes is a key architectural property. It means the WinRT projection can evolve independently of the Swift compiler, new WinRT patterns can be added as library code, and the compiler team's investment is focused on the COM layer that benefits all COM-family implementations.

What the developer sees

Consuming a COM interface

// COM: ISpVoice inherits ISpEventSource inherits IUnknown
// Swift developer sees a protocol with methods and properties:
let voice = try SpVoice()
try voice.speak("Hello, world")

if let eventSource = voice as? any ISpEventSource {
    for await event in eventSource.events {
        print(event)
    }
}

No vtable structs. No QueryInterface calls. No AddRef/Release. No HRESULT checking. The COM binary interface is invisible.

Implementing a COM interface

@COM(CLSID: "...")
final class MyWidget: IWidget {
    func render() throws { ... }
    var name: String { get throws { "MyWidget" } }
}

The class is a COM coclass. It can be activated via CoCreateInstance from C++, C#, or any COM-speaking language. The compiler synthesises the vtable, the class factory, and the registration exports.

Declaring a new COM interface

@COM(IID: "...")
protocol ICanvas: IUnknown {
    func drawRect(_ rect: Rect) throws
    var background: Color { get throws set }
}

The protocol is a COM interface. Its IID and vtable layout are part of the module's ABI. Non-Swift consumers can use it through a generated C header or IDL file.

Relationship to existing interoperability

COM interoperability follows the model established by Objective-C and C++ interop in Swift:

  • Like -enable-objc-interop, COM interop is gated behind -enable-com-interop (enabled by default on Windows).
  • Like the ObjectiveC module, the COM module provides foundational types and overlays.
  • Like -emit-objc-header, a future -emit-com-header would generate C/C++ interface descriptions from Swift @COM declarations.
  • Like Objective-C's isa pointer, COM's vtable pointers precede the Swift object header in a layout that is ABI-compatible with C++ COM objects.
  • Like @objc on classes, @COM opts a type into the foreign type system while keeping it fully usable from Swift.

Both Objective-C interop and COM interop are platform-independent in principle (Objective-C depends on the ObjC runtime ABI, COM depends on the COM vtable ABI). The practical difference is that COM has multiple independent implementations across platforms (Windows COM, IOKit, XPCOM, MiniCOM), while Objective-C interop targets a single runtime family. The same @COM attribute works with all of these implementations. Platform-specific features (apartments, activation, WinRT) are layered on top.

Companion documents

The detailed design is split into two companion documents:

  • COM Interoperability Design (com-interop-design.md) — covers the compiler and language-level work in Layers 1 and 2: the @COM attribute, object layout, ARC bridge, QueryInterface implementation, Clang importer integration, threading model, activation, aggregation, and the COM module contents.

  • WinRT Projection Design (winrt-projection-design.md) — covers the pure-library Layer 3: IInspectable synthesis, parameterised interfaces, async bridging, event sequences, collection conformances, HSTRING bridging, and WinRT activation.

Future directions

Automation (IDispatch, VARIANT, SAFEARRAY)

IDispatch enables late-binding COM access from scripting engines and Office VBA. VARIANT is COM's dynamically-typed value container. These are important for Office automation and scripting interop but are deferred from the initial design.

DCOM and Swift's Distributed module

DCOM extends COM with cross-process and cross-machine invocation. The current design accommodates DCOM as a natural extension: a future COMDistributedActorSystem could bridge COM's proxy/stub infrastructure to Swift's distributed actor model with no changes to the core @COM attribute or ABI.

Interface export tooling

Generating C/C++ headers, MIDL IDL, and .winmd metadata from Swift @COM declarations would complete the bidirectional story, allowing non-Swift consumers to call Swift COM objects without hand-written interface descriptions.

Automation type bridging

DECIMAL, CURRENCY, DATE, MIDL unions, and SAFEARRAY need Swift mappings for full fidelity with the OLE Automation type system.

38 Likes

I love this pitch, and can attest that as we've gone deeper into Windows development on Swift it's become clear that being able to integrate more deeply with the operating system (and offer common functionality) would be a massive benefit for anyone coming to Swift on Windows.

I COMpletely love this pitch!

7 Likes

Love it! I’ve been working with Swift on Windows quite a bit recently, and this proposal sounds great.

Do you think that this would help solve the large compile times and binary sizes of thebrowsercompany’s winrt bindings?

Yes and no; it should help the binary sizes to a certain extent, but the binary size would correlate directly with the number of interfaces projected. With SPM, LSP, and IndexStore integration, it should be possible to trim the interfaces some. This should indirectly improve the compile times as there should be fewer interfaces to synthesize.

1 Like

I’m curious how you’ll handle some COM esoterica. :slight_smile:

Every interface derives from IUnknown, from which it inherits AddRef(), Release(), and QueryInterface(). But unlike Swift, COM does not guarantee that all IUnknown descendants inherit the same implementations. So it’s possible for an expression like pUnk->QueryInterface(IID_One)->QueryInterface(IID_Two) to invoke two completely separate implementations of QueryInterface(). Presumably this will require a Swift expression like unk as? One as? Two to compile down to the same dance, in case unk is one of these pathological (or aggregated) COM objects. But is it possible that the Swift compiler might currently be trying to optimize chains of as? casts based on Swift’s semantics that IUnknown has exactly one implementation?

2 Likes

You are correct, that Swift will need to lower this appropriately with the various QueryInterface calls. This is something that we would have to grapple with when we actually delve into the details (which will be some point soon). I want to allow people to consider the vision itself so that we do not end up taking the implementation as the vision and let the vision drive the implementation.

5 Likes

Yes, sorry, I was just curious if you’d already hit this in your experimentation, or could foresee any other wrinkles in the translation. Because otherwise the pitch seems perfectly straightforward!

One problem of current swift-winrt projection of TBC is that can't build WinUI package with Swift 6 concurrent check. Is that will be fixed by the COM Interoperability ?

That is an implementation issue, not a fundamental issue. Can you be more specific about the issue with the vision and your concern about Concurrency? If it is specific to the TBC swift-winrt fork of my work, then that is best addressed on the fork rather than on this thread.

In practice, -enable-objc-interop is not really tunable independent of the platform; turning it off breaks ABI and creates a Swift runtime incompatible with the platform on Apple platforms, since the flag changes runtime behavior and data layouts of compiler-generated metadata in incompatible ways, and of course there is no Apple ObjC runtime to interoperate with anywhere else. Thankfully, it sounds like most of what you have planned for COM interop is more cleanly separable from the rest of the implementation, with maybe the exception of as? support, which would require changes to the Swift runtime to pass as? casts to COM types along to QueryInterface. An option you might consider in addition to the runtime change might be to provide interface queries via an .as(IFoo.self) extension method on IUnknown. Being an regular method, that method could be used on platforms where the runtime does not have COM integration enabled, and as an added bonus, it also wouldn't be subject to interference from compiler assumptions about how as? should work that might not mesh with QueryInterface's real behavior, as @ksluder noted above.

as? integration would still be nice polish for platforms where the runtime can be made to cooperate. On such a platform, I wonder whether some of Swift's native metadata structures could also be altered to be more COM-friendly, like they're altered to be ObjC-compatible for Apple platforms. For instance, it might be interesting to have all class metadata records be IUnknown-shaped, so that AnyObject is IUnknown* compatible in a way similar to how it's id-compatible with ObjC interop. Protocol witness tables could similarly be prefixed with IUnknown methods so that existentials can be readily boxed into COM-compatible objects.

6 Likes

You're right that most everything can be separated, and that -enable-com-interop really just enables parsing @COM and the dispatch of as? and as! to a thunk that forwards to a library function call in the COM module.

Pulling some content from the future design notes, there is one core runtime change that needs to be wired through to support this: changing swift_retain and swift_release.

ptr @swift_retain(ptr) -> { ptr, i64 } @swift_retain(ptr)
void @swift_release(ptr) -> i64 @swift_release(ptr)

Now, on all currently supported architectures, this happens to be ABI compatible - the result will be splat across x0/x1 on ARM64 and r0/r1 on ARM64. Similarly, on x64, this will return via rax/rdx. This is incompatible only on i686 I believe, but the those platforms are not ABI stable.

This allows us to use swift_retain and swift_release with a small thunk as AddRef and Release.

The remainder of the design really is just the synthesis of the protocol and protocol metatypes from the @COM attribute and their associated data tables (itable, vtables, etc).

I think that we could certainly look into some of those shape changes in the future, but, the idea is that we could keep the core of this support relatively general and shared with the normal Swift path.

1 Like

This would also be breaking for wasm, which doesn't allow for any sort of punning of function calling conventions. Now, the ABI isn't stable there, but wasm payloads do tend to be size-sensitive, and given the frequency of retain/release calls in generated code, the need to throw away an extra return value after every retain could accumulate. How bad would it be to lie about the reference count in your AddRef thunk and just return 1 or some other fixed result? Many framework classes on Apple's platform do that, and we generally strongly discourage developers from ever looking at retain counts for anything other than entertainment purposes (though I know Hyrum's law tends to be more tenacious in Windows land).

1 Like

I think that we could change the IRGen to actually properly return the structure. So, I don't know if this is guaranteed to break WASM. I do wonder how much this will actually change size in practice, LLVM should be able to realise that the value is not used and ignore it for the non-COM cases. Additionally, with the work for the direct retain/release, I suspect that the WASM side should have a good way forward.

I'm worried about exactly that. Although the documentation mentions that the reference count is advisory as it is in Swift. But, applications do tend to observe the refcount in C++ land. Fortunately, on the Swift side, it would be hidden from users.

Looks great, and very well mapped out! :+1:

Are the companion documents already accessible somewhere, or private for now?

I've retained them as private for the moment, I just want to get people to sit with the vision for a moment before we start drilling into my notes on how the implementation might function.

At first, I was worried only that since wasm is a stack machine, even ignoring the extra return value would require an extra pop instruction in the wasm payload. But looking at the actual code generation, it looks worse than that—even returning a 2-tuple goes through an indirect on-simulated-stack return slot rather than the wasm value stack:

I suppose a good implementation would still optimize this somewhat in the translation to native code, but the size of the original wasm payload itself is still important for efficient transfer from the server, and that's quite a bit more instructions per retain.

Perhaps we could instead add a new swift_retainAndGetCount(), which we only need to use for COM interop, instead of touching swift_retain itself. (If the "just return a fake value from AddRef" approach doesn't work in practice.)

1 Like

Yes, this was the alternative that I have in mind if we find that we want to retain (no pun intended) the current RR shape. I do think that it would be nice to have the single one if possible though.

I think that this is an artifact of Swift. We do not enable the necessary support in LLVM and do not generate IR appropriately.

It is a single instruction more drop as expected.

1 Like

I see. I had checked the equivalent code in Clang as well, which doesn't appear to set the flag by default either. (Though to be fair, the Compiler Explorer doesn't appear to host actual native wasm toolchains, so this could also be a side effect of me hacking a non-purpose-built compiler with the -target flag to force it.) I do think even a single extra instruction per retain could still be a noticeable regression for some users.

I've been thinking that this should have been the case for many years. Naïvely it seems like Swift is a great choice in a COM ecosystem.

It's been 9 years since I touched anything COM, so I'm not much of an interested party anymore, but I wanted to say:

You might want to see if Span could work here (or some other variant that allows aliasing memory–cc @Douglas_Gregor). Array is almost certainly going to require copies every time you go in or out of an API that uses counted pointers.