Embedded Swift Linkage Model

Embedded Swift uses a different linkage model from “Desktop” Swift. These differences become especially pronounced when building and linking static libraries with Swift code. This document explores their differences and provides a suggested design direction for a more suitable Embedded Swift linkage model that accounts for static libraries and provides more control over where Embedded Swift symbols are emitted.

Linkage models

Desktop Swift linkage model

When compiling a library, Desktop Swift generates symbols for all public , package , and open declarations in a module [1] into the resulting object file (.o ). The definitions of some of these declarations might also be available for clients to inline (e.g., through the @inlinable attribute or cross-module optimization), but the canonical definition is always present in the object file.

Generic functions and types in this model require extensive use of type metadata. For example, a generic identity function

  func identity<T>(_ value: T) -> T { value }

compiled into an object file will require type metadata so that it can copy the T value. Protocol conformance requirements like T: P would require additional metadata. This metadata is not present in Embedded Swift, which therefore cannot emit object code for generic functions.

This linkage model allows some implementation hiding from clients. A non-@inlinable function’s definition will only be present in the compiled object file, so it can refer to entities made available via an internal or @_implementationOnly import. However, without library evolution, this does not in practice fully hide implementation dependencies from clients, as discussed in SE-0409.

Embedded Swift linkage model

When compiling a library, Embedded Swift always retains definitions of every entity in its intermediate representation (SIL) stored in the compiled module file (.swiftmodule ). The compiler generates object code for either the transitive closure of the current module and all of its dependencies (the default, suitable for executables) or an empty object file (with -emit-empty-object-file , suitable for libraries).

Embedded Swift relies on specialization of generics. For example, when using the identity function ahead with a given generic argument (say, Int ), Swift will create a “specialized” function identity<Int> . While specialization is an optimization in Desktop Swift (which has the generic implementation to fall back on), it is required in Embedded Swift because it eliminates all uses of type or protocol conformance metadata. Specialization requires access to the definition of the generic function (including across module boundaries), which is always possible in Embedded Swift because the compiler retains all function definitions in the .swiftmodule (as SIL).

The fact that all definitions are stored in the .swiftmodule means that there is no implementation hiding: importing a module via internal or @_implementationOnly does not hide the dependency on that module from clients, because the full implementation is already exposed.

Analogy with the C(++) linkage model

C and C++ effectively provide both of these models, with fine-grained control over how a particular symbol is exposed. The “Desktop Swift” model is similar to defining a symbol in a .c or .cpp file, e.g.,

// hello.c or hello.cpp
void hello(void) {
  printf("Hello!\n");
}

The resulting object file will contain the _hello symbol (or __Z5hellov in C++ with the Itanium ABI) with the implementation. The interface that clients use is expressed in a separate header file that doesn’t have the definition:

// hello.h
void hello(void);

The “Embedded Swift” model is similar to making the function (static [2]) inline in the header:

// hello.h
/*static in C*/ inline void hello(void) {
  printf("Hello!\n");
}

Here, no symbol will be emitted unless hello is actually called. A client that calls hello will produce object code for the implementation hello .

C++ templates follow a similar model to inline . When instantiating a template with a given set of template arguments, the definition of that template has to be provided in the header, and its object code will be emitted into the client. This is essentially what Embedded Swift does with specialization of generics.

C++ template instantiations and inline functions produce symbols that can be de-duplicated by the linker. Those definitions all need to be the same according to C++‘s One Definition Rule (ODR).

Issues with the Embedded Swift linkage model

There are a few issues we’ve run into with the Embedded Swift linkage model.

Non-Swift clients of Swift libraries

The Embedded Swift linkage model assumes that the final executable will be built as Swift, and can generate all of the object code for that module and everything it depends on. If instead the Embedded Swift code is meant to be packaged into a static library for use by non-Swift clients (such as a C client calling into a @_cdecl Swift function), this model does not work as well.

One of the issues is that the Swift compiler will need to emit definitions for the module being compiled into the static library as well as the transitively closure of everything that module depends on. Let’s say we do this for two Swift modules B and C , both of which depend on a common Swift module A . If another module D links the static libraries for both B and C , it will result in duplicate symbol errors for the symbols in A .

[copy of A] [copy of A]
      |         |
      B         C
      |         |
      +----+----+
           |
           D

The Embedded Swift flag -mergeable-symbols emits all Swift symbols such that they can be de-duplicated by the linker, using the same mechanism as C++ does for template instantiations and inline functions. This eliminates linker errors due to duplication. However, it does not provide a way to ensure that specific symbols only have a single definition in one object file, the way that defining a function in a .c[pp] file does.

No link-time polymorphism

Link-time polymorphism refers to a technique where there are multiple implementations of a given interface, and one selects which implementation to use by linking in the static archive corresponding to one of the interfaces. For example, in the C world, one can have a standard “hello” header:

// hello.h
void hello(void);

There might multiple implementations of this interface built into different static libraries. For example, libhello_english.a might contain:

void hello(void) {
  printf("Hello");
}

whereas libhello_français.a might contain:

void hello(void) {
   printf("Bonjour");
}

Then, you can select the appropriate implementation at link time via -lhello_english or -lhello_français , respectively. If somehow both get linked into a binary, the linker will produce a duplicate-symbol error to detect the problem. The Desktop Swift model of compilation supports this approach in the same way C does for non-inlined code, although it takes care to ensure that these two modules share the same interfaces.

Embedded Swift does not work well here. The implementation of hello would be inlined into each of the clients, so it’s not possible to replace those implementations later. This would normally be detected by the linker as an error (so at least one wouldn’t accidentally make this mistake), but trying to address the first problem by using -mergeable-symbols would hide this problem.

No implementation hiding

All of the definitions within a Swift module are exposed in the .swiftmodule file, so that clients can generate code for them. If these definitions depend on some C headers in a module, those C headers and the module map that covers them must be available to all clients. In practice, this means that any dependency on a C header requires that header to be modularized, including in all of the clients.

The internal import feature (or its predecessor, @_implementationOnly import ) does not adequately protect against this. It will detect attempts to use entities from the internally-imported module in the signatures of public APIs, but does not hide those dependencies from client compiles. With Desktop Swift’s linkage model, one can hide those dependencies using library evolution and non-inlinable code, but no such affordance exists for Embedded Swift.

Proposed solution

Explicitly mark symbols as being part of the object file

We propose to introduce an attribute @alwaysEmitIntoObjectFile that specifies that a given non-inlinable function should be emitted into the object file, and that its definition should not be placed into the corresponding .swiftmodule file. For example, given this definition:

@_cdecl @alwaysEmitIntoObjectFile
public func hello() {
  printString("Hello")
}

The Swift compiler would emit the definition in the symbol _hello in the object file, as a strong definition (i.e., one that cannot be merged with others of the same name). However, it would not emit the definition into the .swiftmodule file, so clients cannot inline the function: they must call into that _hello symbol. Since this is a @_cdecl entry point, a C program could also link the library and call this function. From a C perspective, one can think of @alwaysEmitIntoObjectFile as moving the definition of the function into a .c file (rather than it being in the header file).

The Swift compiler may also need to emit additional code into the object file to support the definition of hello . For example, the printString function might come from another Swift module Printing , where its definition is only in the .swiftmodule file. Or printString might be another function within the same module. Either way, the definition of printString will be emitted into the object file (so hello can call it) as a symbol that can be merged by the linker.

There are some necessary restrictions on @alwaysEmitIntoObjectFile , including:

  • It cannot be used on a generic function, or a function within a generic type.
  • It cannot be used on a function marked @inlinable or @_alwaysEmitIntoClient.

For _main and other entry points, some mechanism will need to ensure that object code is generated for that entry point, as well as triggering object code generation for all of its dependencies. This object code must not be stripped from the resulting binary.

Module-level flag for “build an object file linkable by non-Swift tools”

Some Swift modules are “leaf” modules that are intended to be consumed by non-Swift tools. For example, they might be linked by C programs that don’t know about Swift, or be passed into the linker to build an executable or shared library.

Swift should provide a module-level flag that indicates that it’s building such a “leaf” module. This has the effect of inferring @alwaysEmitIntoObjectFile on every public , package , and open declaration in the module that is well-formed by the rules above (i.e., it is non-generic and isn’t marked @inlinable or @_alwaysEmitIntoClient ). It should also emit any serialized @_used declarations and entry points from imported modules into object code, effectively encapsulating all of the Swift code.

Don’t infer @alwaysEmitIntoObjectFile from other attributes

There are a few cases where we might be temping to infer @alwaysEmitIntoObjectFile . For example:

  • @_used declarations could infer @alwaysEmitIntoObjectFile, because they will eventually need to end up in an object file.
  • The _main entry point will need to be emitted into object code, and we could infer @alwaysEmitIntoObjectFileon to ensure that happens.
  • @cdecl @implementation functions provide Swift implementations of C functions that were declared in a C header. By definition, at least some of the clients of such functions are C clients, so they will need the symbols defined in object code, which could be accomplished by inferring @alwaysEmitIntoObjectFile.

In all of these cases, inferring @alwaysEmitIntoObjectFile defeats whole-program optimizations that could be important in embedded Swift, because the object code is generated in the library’s context rather than in the context of the whole “leaf” module. Therefore, we should not pursue inference of @alwaysEmitIntoObjectFile from other attributes, and will instead rely on knowledge of which modules are ”leaf“ modules.

Implementation

I've started an implementation of this in a pull request here.

Doug

9 Likes

/cc @rauhul @kubamracek @Erik_Eckstein

I'm lacking in experience/familiarity with embedded Swift, so this might be an obvious question, but is the need for whole-program optimizations so great that it has to be served by the compiler at this stage, rather than something like LTO?

If LTO isn't an option (I concede it adds a lot of complexity), @alwaysEmitIntoObjectFile feels like the opposite default to what I would expect from experience with other languages. If I'm writing C code, most of that is going to go into .c files, and I'm opting into whole-program inlining for specific functions by writing them as static inline in my header files. Why not have the analogy for Swift be that non-generic decls are emitted into object files by default, and an attribute like @inlinable/@_alwaysEmitIntoClient is used to opt into inlining? (Generic decls would of course need to remain inlined for the reasons described, which is also analogous to C++ templates.)

1 Like

My initial impression to this is extremely negative; I feel this is committing embedded Swift to a permanently different trajectory from regular Swift, permanently dialectizing it and creating more problems for developers trying to support both. Many of the aspects of embedded Swift today you cite, such as the reliance on deep specialization and exposure of all implementation details in swiftmodules, seem like things we should treat as limitations of the current implementation to eventually overcome rather than essential aspects of embedded Swift. And some of the problems you want to solve could be addressed in other ways that benefit Swift everywhere. For instance, for link-time polymorphism, we could either provide the ability for modules to specify explicit swiftinterfaces, or provide a programmatic way to validate that two implementations of a module implement the same swiftinterface, features developers of ABI-stable libraries for desktop Swift have been asking for for a while.

It isn't clear to me why we couldn't have this be a build-level setting to apply the existing Swift build model for libraries, where public declarations correspond to exported symbols in the build product. I do see that we would have to productize @_alwaysEmitIntoClient to make this viable for embedded as it exists today, and embedded developers might have to slather it more liberally than they might otherwise like to given the current limitations on the implementation. To @allevato's comment, that feels more like the better directionality for this concept.

7 Likes

The optimizations you need are whole-program in nature and really need to be done by Swift's optimizer, which can reason about Swift-specific properties (inheritance, conformances, specialization, etc.). Once you've lowered down to LLVM IR (on which LTO operates), you've lost much of the semantic structure you need.

It depends on what languages you're basing this on. C might have most of its code in .c files. C++ tends to involve a lot of templates that are necessarily in headers, including many exposed implementation details. Rust tends to keep more of its code in its own intermediate representation longer (the way Embedded Swift does), and the idea of "leaf" libraries I'm describing is similar to Rust's staticlib.

Unspecialized generics carry with them the need for runtime overheads that are not going to be acceptable in Embedded Swift. So I don't think the reliance on deep specialization is a "current limitation", I think it's fundamental to the problem. And deep specialization, in turn, requires exposing implementation details.

"Link-time polymorphism" is the least important part of this, and effectively falls out of having a model where we can reason about where symbols have their final definitions.

Yes, we could. We could skip over the @alwaysEmitIntoObjectFile attribute part of this proposal and only implement the "leaf" build setting, which makes the public declarations exported symbols (unless they are generic, which we cannot do, or marked always-emit-into-client).

In effect, "leaf" is the default for Desktop Swift. We need this concept for Embedded Swift, but it still seems like the wrong default for Embedded Swift, where we generally want to delay code generation until we have everything together.

Yes, we do need to productize @_alwaysEmitIntoClient. It's how we keep specific symbols out of the ABI, and library developers depend on it. It lets one express "header-like" semantics when you are building a "leaf" module. @alwaysEmitIntoObjectFile is the dual of @_alwaysEmitIntoClient when you are building a non-leaf module, i.e., the default for Embedded Swift.

Doug

Aside: @inlinable should have had the semantics of @_alwaysEmitIntoClient IMO

2 Likes

Thinking about this more, I'm not so sure the "leaf" model as it exists today is necessarily the best default for desktop Swift in all situations either. A build model where all module implementations are visible across modules for optimization, or at the very least generic specialization, purposes is the mental model that many people shipping packages and building end-user executables intuitively expect, and developers are disappointed that they don't get that model out of the box. Many package developers already have to deal with generously slapping @inlinable on things to get specialization where they expect it, and desire stronger controls over where unspecialized generics are allowed to occur. It could well be that embedded Swift is aggravating this existing user need by not (yet) supporting unspecialized generics at all.

This feels to me like something we could address with better defaults or build-system-level policy, rather than more manual control knobs for developers to manipulate, though in the spirit of not letting perfect get in the way of good, I could see the near-term utility in giving developers additional control here.

I wonder if we could look at "public interface" and "public implementation" as different aspects of declaring a public API, and organize these as modifiers on public itself:

// Only the interface is exported (alwaysEmitIntoObject)
public(interface) func foo()

// Only the implementation is exported (alwaysEmitIntoClient)
public(implementation) func bar()

// Both the interface and implementation are exported (today's @inlinable)
public(interface, implementation) func bas()

public without modifiers could mean "follow the prevailing build's policy", with the modified variants available for fine control. (It's not a slam dunk, though, since the modifiers aren't perfectly correlated to public; they would also make sense to apply to @usableFromInline, and any other feature that can introduce new exported symbols.)

Arguing about unspecialized generics that isn't essential to the proposal

I don't believe this is necessarily true. The only essential cost of an unspecialized generic interface is the dispatch through a protocol witness table. On desktop Swift, unspecialized generics get burdened with additional inessential overhead because of overeager metadata lookups, excessive implicit copy and destroy operations that go through the value witness tables, and the deep loss of specialization for generic conformances because we don't have specialized witness tables there. Embedded Swift doesn't support type metadata operations (and that, I am fine with that), and since it doesn't have a centralized runtime to coordinate with, it's in a much better position to specialize witness tables by need. We also have ~Copyable now to more easily control copies.

Large C++ and Rust projects gradually grind down in build times under the accretion of generic interfaces that need to be ingested by the compiler all at once. The fact that our model readily supports separate compilation of generics is an important competitive advantage for larger-scale projects.

4 Likes

I agree that we model we have for Desktop Swift could be improved by allowing more cross-module optimization whenever there isn't a resilience boundary in the way. I'd love for that to be more automatic, so that package developers don't have to put @inlinable all over the place to improve performance across modules.

In a sense, Embedded Swift is closer to that model already, because by default it is serializing everything. That has downsides, like the inability to encapsulate anything in object code (e.g., references to non-modularized C headers), but it's good for the optimizer. I think, even if we line up the models perfectly, and even in Desktop Swift, I still want the ability to say "no, don't allow cross-module inlining for this particular entity, because it encapsulates something that shouldn't leak across module boundaries." Perhaps that's better spelled @onlyEmitInObjectFile or @notInlinable or something.

Doug

Speaking from experience of developing many packages where we were forced to essentially slap @inlinable on every API not only generic ones but also non-generic ones, I share @Joe_Groff`s concerns about the new proposed attribute. My hope has always been that we can bring over some of the optimizations from embedded Swift into regular non-resilient Swift. There were already multiple instances in the past years where folks benchmarked Swift vs languages like Rust and saw a huge performance difference just due to the fact of the non-existence of automatic cross module optimizations.

I really like the idea of inverting this. From just the non-resilient Swift point of view I think it is far more common that we want to emit a symbol into the swiftmodule for better optimizations than the other way around.

1 Like

Maybe, but it also means you have to rebuild clients even if you've "just" changed the implementation of a function. There are tradeoffs here.

There are three different ways we can emit the definition of a given symbol:

  1. Only serialized, such that the client will have to generate code whenever it uses the symbol. At the symbol level, this is can be selected by using @_alwaysEmitIntoClient.
  2. Both serialized and in the object file, such that the client can choose to emit a copy for inlining/specialization/etc., or can link the version in the library. This is what Embedded Swift currently does by default. At the symbol level, this can be selected by using @inlinable.
  3. Only in the object file, so a client must link the version in the library. This is what Desktop Swift currently does. At the symbol level, there is no way to select this, although I'm playing with @onlyEmitIntoObjectFile for this purpose.

There are benefits to each of these. You and Joe have expressed that Desktop Swift should move from (3) toward (2) over time, and I agree.

Embedded Swift benefits from (1) when all of the clients are Swift and you only want some parts of the library (e.g., it's great for the Swift standard library). Embedded Swift would benefit from (3) to allow dependency hiding and better linking with C clients. And (2) can remain a reasonable default for Embedded Swift if we start using mergeable symbols properly.

I've introduced a separate pull request that implements both (1) and (2) for Embedded Swift (triggered by experimental feature flags), and offers the @onlyEmitIntoObjectFile attribute to opt in to the semantics of (3).

I wish I could come up with good names for (1), (2), and (3), because it wouldn't be unreasonable to have them as options for compiling a module in Embedded Swift. The best I have right now are "on-demand", "optimizable", and "sealed".

Doug

Is this actually true in practice today? (Maybe it's only in embedded Swift?) When I compile a small example for arm64-apple-macos:

public struct S {
    @_alwaysEmitIntoClient
    public func myFunc() -> Int { return 20 }
}

I see the full symbol in the emitted .o file:

                                    _$s1SAAV6myFuncSiyF:        // S.S.myFunc() -> Swift.Int
0000000000000008 80028052               mov        w0, #0x14
000000000000000c C0035FD6               ret

I would love to nail the behavior you've described down regardless of linkage model. I'm working on something (in "desktop Swift") where I have some small helper functions that only exist to reduce repetition in other generated code. I want them to be aggressively inlined because they reduce to simple memory reads/writes when the optimizer can see the whole thing end-to-end (this happening in debug builds would also be a nice-to-have, but some stdlib API use prevents that). It would never be correct to form an unapplied reference to one of these functions so I would even be fine with that being an error if attempted.

Having the compiler never emit the symbol would give me more confidence about minimal code size than hoping that the linker does the right thing and tosses the symbol. (Since they're called from generated code, possibly in different modules, the helpers have to be public and I'm relying on the experimental -internalize-at-link to make them removable from the final binary.)

If you turn on whole-module optimization for the module, the symbol will go away because nothing references it in the module.

What you're seeing is that we emit the symbol (as a weak definition) in non-WMO builds. We do that because there might be another source file in the same module that calls this function, and we don't have a way to reach across to the defining source file to get the implementation when we might need it. When you're in a separate module, you can rely on the serialized SIL from the defining module. It's effectively an architectural limitation of the compiler showing through here.

[Edit: when I did this locally, I also noticed that the symbol has weak_odr linkage rather than linkonce_odr linkage. The former requires the symbol to be kept by the linker, which seems wrong to me.]

Doug

1 Like

Thanks! That makes a lot more sense (just confirmed it with my own example). I was thinking that it was "obvious" that the source of the function should be available across the entire module, but I guess a non-WMO compilation doesn't process enough of the non-primary file to be able to pull it out and do the inlining directly (as opposed to the C header analogy, where the entire textual body of the header is part of the same compilation unit).

One concern I have with this proposal is that it gives the impression that the costs of exposing a generic declaration in desktop and embedded Swift are roughly equivalent; particularly if you use embedded Swift without knowing about this new annotation.

For many of the folks who would be the target audience of embedded Swift, it might be astonishing to find out that all code is effectively produced into the equivalent of a header file (unless the declaration opts out) and that the dependents need to recompile to consume the change, rather than the default being to produce the declaration into the object file and have dependents merely relink to consume the change.

When comparing the models of embedded-Swift's linkage to C/C++, I think it's important to note that for C/C++, whenever a header file is modified the dependents may need to be recompiled but whenever the translation unit sources are modified, the dependents only need to be relinked. Additionally, header files are generally ever changed by hand, they're rarely produced as part of the build process so the build of a dependent likely won't cause their header to be modified; Swift always produces module files as part of module build, changes that require recompilation always end up transitive for embedded swift whereas they are almost never transitive for C/C++.

Given the ways that this linkage approach can substantially amplify the build cost of a change, I think having an opt-in approach to having declarations in the module interfaces rather than an opt-out approach would give a clearer impression of the associated costs; if I have to explicitly mark the code as inlineable for it to end up in the swift module when building embedded, I know that it actually gets consumed like inlineable code. If I'm building the same code for both desktop and embedded and it's marked inlineable, then the build-time costs for both cases are more equivalent.

Additionally, having some equivalent notion of 'library evolution' for embedded swift, even if it's just a flag that prevents any inlineable or generic code ending up in the swiftmodule would help track where interfaces expose enough detail that 'backwards-compatible' changes may still require recompilation (much as evolution does for Desktop Swift)

Thanks for spelling out those three different ways of emitting symbols. While I agree that Desktop Swift should move from 3 to 2 over time. I think it misses a point that I tried to make earlier. Embedded Swift is currently employing different optimizations and compiler flags and I would argue some of those are also applicable to general non-resilient Swift. Those optimizations often lead to faster code and can also lead to reduction in binary size especially in asynchronous heavy code.

Similarly here, this is not only interesting for Embedded Swift but also for other use-cases with C clients.

I think this trade-off is almost always worth it in release builds. Similar to how we enable whole module optimization in release mode we should consider enabling CMO in release mode. This should keep the incremental debug builds fast and avoid any unnecessary recompilation.

These are fundamental differences between the C/C++ header/translation unit model and the Swift single-source model. In C/C++, you know exactly what you're exposing in the ABI because it's in the header, but it also embeds itself into the way you write code (e.g., always passing pointers around so you can change the layout underneath) such that it's hard to change later.

Build time is not the only concern we have. Unspecialized generics are a performance issue, and others in this same thread are asking for more code to be serialized by default in Desktop Swift because of the optimization opportunities that it unlocks. This is a trade-off, and the proposal is providing some knobs to let folks pick what's best for their environment.

It could be an option to require explicitness here. I think it would make for a poor default for most of the Swift user base, because it would add more friction to moving code between models (especially between Desktop and Embedded Swift) and most users won't try to skip rebuilds of dependents after making changes because their build system doesn't model it.

With library evolution, the code isn't the interesting part: it's the data layout that matters most, and it's more trade-offs. I'd rather explore that in a separate thread, though.

I agree. I don't know what prevents us from doing that now.

Doug

3 Likes

Thanks in general for pointing out & figuring out how the module/interface & the linkage model function together. A couple of clarifying questions, mostly for the Embedded Swift use cases.

Should internal declarations always be @alwaysEmitIntoObjectFile? I read @alwaysEmitIntoObjectFile as orthogonal to symbol visibility, and you did mention

Therefore, we should not pursue inference of @alwaysEmitIntoObjectFile from other attributes

But I also inferred that to be able to hide internal dependencies, all usages from those internal dependencies would also be removed from the emitted Swift modules and emitted into the object files. I am thinking of this as analogous to symbols with hidden visibility in libraries. Is this an incorrect assumption?

For the spelled-out restrictions:

Would you have a sense today if current embedded, but maybe also desktop, developers depend on defining internal generic functions? If so, those wouldn’t be able to adopt @alwaysEmitIntoObjectFile so would that be forbidden or still emitted into Swift modules and therefore may end up exposing internal module dependencies?