Windows Cross-Module Protocol Conformance

Windows models inter-module accesses differently from MachO and ELF. Much like MachO, there is two level name spacing (each symbol is addressed as {module, symbol}), but the access itself is indirected. Interfaces which a module exports for others to use are enumerated and placed into the Export Address Table. Uses of those interfaces are indirected through a synthetic import symbol which resides in the Import Address Table. This breaks down when you are looking at static initialization of data with pointers, so when you have a type conforming to a protocol defined remotely (e.g. TextOutputStream), the link cannot be fulfilled. Since the ABI has not yet been finalized, I was wondering if it would be possible to accommodate something like this so that the implementation does not diverge across targets. This item is being tracked as SR-6489.

For a concrete example, SwiftPrivate's IO.swift defines _Stderr as conforming to TextOutputStream which is defined in swiftCore.dll. This causes a failure to build this module.

Thoughts @John_McCall, @Arnold?

CC: @Torust

I’ll add that this applies to value witness tables as well as protocol descriptors, and can be currently worked around by statically linking the relevant symbols into the target object. A proper solution for dynamic linking would be great, though.

It might also be worth mentioning that MinGW/Cygwin get around this issue in dynamic linking using a pseudo-relocation (see, which is admittedly a fairly ugly workaround but would be viable if changing the ABI to accommodate Windows here is not possible.

The psuedo-reloc approach means that you are limited to binutils as a linker, which is extremely slow, and breaks down in a number of cases. Additionally, on the other platforms that binutils supports, we have seen incorrect behavior from the linker, resulting in invalid binaries. I think that trying to use that linker would be a large undertaking on its own.

1 Like

The one thing that comes to mind, which I think would be expensive would be to have a C++ initializer for each table, and have it fill in the table entries. This would also mean that the data could not be marked as read-only (or rather the equivalent of .relro on ELF).

CC: @John_McCall @Douglas_Gregor @Arnold @Erik_Eckstein

Which data structure in particular has the cross-module pointer reference? For a cross-module protocol conformance, like your _Stderr: TextOutputStream, I would expect only the ProtocolConformanceDescriptor for the conformance to contain a reference to the protocol descriptor from the standard library. We already optimize that data structure not to include embedded relocations, by using a RelativeIndirectablePointer<ProtocolDescriptor> to refer to the possibly-external protocol descriptor. On Mach-O and ELF, this will contain the offset to the GOT entry for the descriptor (masked with 1 in the low bit to indicate that it’s doubly-indirected). On Windows you might be able to form relative references to the import address table entry instead.

It’s possible that the way we encode the GOT reference in LLVM is problematic when targeting Windows. We rely on a bit of cleverness in the Mach-O object writer to recognize relative offsets to an unnamed_addr constant containing the external symbol’s address as being semantically interchangeable with a @GOTPCREL relocation. Without this optimization, LLVM would just emit a global variable initialized to the external address. You might be able to replicate that cleverness in the PE writer to allow an import address table entry to stand in for a global unnamed_addr pointer constant in the same way.

I take it that the Windows linker makes code pointers from data work by using the address of a thunk? Because otherwise I can’t imagine how any cross-DLL C++ code works at all.

I think the only places where we have pointers to value witness tables from a different image are the optimizations for re-using VWTs for classes and small PODs. That would not be difficult to disable (in favor of a locally-uniqued copy) if we needed to; of course, it would affect code size. And classes could be special-cased quite easily to still use the common VWT by just initializing it during class realization, assuming we do still have a well-defined realization step when ObjC interop is disabled.

Yeah, code pointers work by taking the address of a thunk. However, for the vfptr (vtable), they are replicated locally.

Local replication does sound very much like exactly what Microsoft does for their C++ implementation (yes, pointer equality of functions is broken in their scheme).

This sounds promising, would you happen to have any pointers to where to get started with trying to prototype something like this?

Yes, that particular case has an unresolvable reference to $Ss16TextOutputStreamMp -- protocol descriptor for Swift.TextOutputStream. I don’t think that LLVM really will ever generate the address of an import for a constant data initializer. I’ll take a look at the trickery that you mentioned and see if that makes sense to replicate for PE/COFF.

An easy way to get something working might be to link these common value witness tables into a static .lib so you can link a local copy into Swift executables. You might be able to get that to work without any compiler changes. Otherwise, you can look at the code in getAddrOfValueWitnessTable that special-cases the prefab type layouts and just disable it on Windows. There are a few places in the Swift runtime where we look for the identity of the common class value witness table for layout optimizations, but that’s a bit of a hack that we could probably do in a better way.

Yeah, code pointers work by taking the address of a thunk. However, for the vfptr (vtable), they are replicated locally.

You’re probably just noticing normal inline-function duplication. Functions defined out-of-line cannot be reliably replicated locally because the compiler does not know what to duplicate.

This sounds promising, would you happen to have any pointers to where to get started with trying to prototype something like this?

It’s just various hard-coded things in GenMeta.cpp where we call getAddrOfValueWitnessTable with a concrete builtin type.

How much does LLVM IR generally try to abstract away the __imp_ symbols for import address table entries? If it’s OK for IR to directly reference them, you could tweak getAddrOfLLVMVariableOrGOTEquivalent so that, if we’re looking for the GOT equivalent of a symbol with dllimport storage class, instead of creating a variable, we instead declare a reference to the __imp_ symbol and use that instead.

If it’s not OK for IR to mess with __imp_ symbols, then it seems to me like it’d be OK in general for the PE/COFF writers to take an unnamed_addr constant that’s initialized with the address of a dllimport symbol and lower them to an alias for the __imp_ symbol.

LLVM IR almost entirely abstracts it away. It adds an attribute to indicate that it is DLL imported and then will handle the indirection in ISEL.

I don’t understand your idea though. The use of it is for a relative offset from the label. However, that is the problem, we don’t have address to calculate the offset from. The _imp_ symbol is a symbol which contains the address, so we would need to offset from the value at the address, which I believe is not representable. What am I misunderstanding?

Mach-O and ELF dynamic linking have the exact same problem, that you can’t form a relative offset to a symbol from outside the current binary. When we want to reference a symbol from outside the current binary, we reference the GOT entry instead, and set a bit in the relative reference so the runtime knows there’s an extra level of indirection. The GOT serves an analogous purpose to the import address table on Windows, so you should be able to do more or less the same thing on Windows; when we ask to getAddrOfLLVMVariableOrGOTEquivalent of a dllimport symbol, the __imp_ can serve the purpose of the “GOT equivalent”.

To try to make it clearer, here’s an example Swift program that defines a type, a conformance to a protocol in the same module, and a conformance to a protocol from the standard library:

struct X { let description = "x" }

protocol P {}

extension X: P {}
extension X: CustomStringConvertible { }

If we do swiftc -emit-ir, we get this conformance table (with relative reference noise elided for clarity):

; This variable stands in for the GOT entry for the standard library 
; CustomStringConvertible protocol descriptor
@got._T0s23CustomStringConvertibleMp = private unnamed_addr constant %swift.protocol* @_T0s23CustomStringConvertibleMp

@"\01l_protocol_conformances" = private constant [2 x %swift.protocol_conformance] [
  %swift.protocol_conformance {
    ...@_T03foo1PMp to i64..., ; protocol descriptor for P
    ...@_T03foo1XVMf..., ; type metadata for X
  %swift.protocol_conformance {
    i32 add (...@got._T0s23CustomStringConvertibleMp...,, i32 1) ; GOT entry for CustomStringConvertible protocol descriptor, +1
    ...@_T03foo1XVMf... ; type metadata for X

For the local conformance, the conformance table records the direct offset to the local protocol descriptor symbol, but for the standard library conformance, it records the offset to a variable that contains the address of the external symbol, and LLVM’s Mach-O writer knows that this variable can be aliased to an entry in the Mach-O binary’s GOT. Since an import address table entry on Windows is more or less the same thing, a variable in memory that gets initialized with the address of an external symbol, you should be able to do the same optimization, either by aliasing an unnamed_addr variable to an import symbol in LLVM somewhere, or by modifying swiftc to use import symbols as stand-ins for GOT equivalents directly.

I think that I can adjust the compiler to generate the reference to the __imp_ prefixed symbols by materializing them manually. However, I think that there may be a few other tables. One thing that I am worried about is that the GOT is contiguous by the IAT is segmented by the module. Since the GOT was already used, I assume that the indirection is taken care of at the various references that are used. I have put up PR-14710 which takes a first stab at this.

However, there are a few other tables which are emitted, and I think that there may be other places that we will have to accommodate the cross-module references. If they can all be handled by manually materializing the import symbol, that is great, otherwise, I think that we will need to consider @John_McCall’s suggestion of locally emitting the tables everywhere (at least on Windows).

The contiguity of the GOT isn’t something we rely on, so modifying the code that generates GOT equivalents to instead refer to __imp_ symbols should cover any cross-module references we’d emit as part of our load-time-inert data structures. Outside of that, there are only a couple other places I can think of where we’d intentionally emit a cross-module reference:

  • Pre-fab value witness tables get referenced from the standard library dll, as John noted. It should be fine to re-emit copies of these on Windows; the only cost would be code size, and loss of a few optimizations in the runtime where we check for types with class-like layout by value witness table identity (which we should do some other way anyway).
  • Non-overridden vtable slots will directly reference the inherited implementations from their base class, or at least, they would pre-resilience. @Slava_Pestov may have fixed that with his class resilience work. You could emit a local trampoline for these.

Other than that, I can’t think of anything else. You agree, @John_McCall?

Yes, I agree. And it sounds like v-tables aren’t a problem anyway because link.exe automatically turns pointers to external functions into pointers to thunks; I doubt that that’s somehow specific to C++ v-tables.

1 Like

Yeah, external function calls will get a thunk inserted if we don’t get it handled properly, but there is a small optimization to be gained from avoiding that thunk. But, improving that is an incremental thing which should not break ABI and can be done later (it is just adjusting the DLL Storage on the calls).

link.exe will presumably only introduce that thunk for references from data when it’s necessary; I don’t think there’s anything to be done on the compiler side.

I think what you’re referring to is the code-generation sequence for dllimport calls, which is indeed something we can optimize later.

Oh, there’s a place we use a prefab dispatch table for keypaths too:

This could be duplicated in the current binary without any semantic effect too. I have a need to reference data across binaries for resilient key paths too, but I’ll keep the Windows restrictions in mind and ping you when I’m ready to land my changes there.