[Pitch #2] Low-level linkage control

kubamracek · January 31, 2024, 6:02pm

Hello, Swift Community!

I have previously pitched this proposal in a thread ~6 months ago. The following version is a more complete and more detailed version of the same idea, with a syntactical change to use a unified attribute called "@linkage" instead of several standalone attributes. A few example usages (quoting from the proposal):

// place entry into a section, mark as "do not dead strip"
// also implicitly make the global guaranteed to be statically initialized
@linkage(section: "__DATA,mysection", used)
let myLinkerSetEntry: Int = 42

// initializer expressions that cannot be constant-folded trigger an error
@linkage(section: "__DATA,mysection", used)
let myLinkerSetEntry: Int = Int.random(in: 0 ..< 10) // error

// code for the function is placed into the custom section
@linkage(section: "__TEXT,boot")
func firmwareBootEntrypoint() { ... }

// attribute syntax allows for future extensions (not part of this proposal):
// @linkage(visibility: .external, alignment: 64, weak, ...)

Proposal document available at: swift-evolution/proposals/xxxx-low-level-linkage-control.md at low-level-linkage-control · kubamracek/swift-evolution · GitHub

github.com

kubamracek/swift-evolution/blob/low-level-linkage-control/proposals/xxxx-low-level-linkage-control.md

# Low-level linkage control

* Proposal: [SE-NNNN](NNNN-filename.md)
* Status: Pitch #2
* Discussion threads:
  * Pitch #1: https://forums.swift.org/t/pitch-low-level-linkage-control-attributes-used-and-section/65877
  * Pitch #2: https://forums.swift.org/t/pitch-2-low-level-linkage-control/69752

## Introduction

This proposal adds a new attribute `@linkage` into the language that can be used to directly control link-time properties of symbols that represent global variables and functions, overriding the compiler defaults. The goal is to enable certain systems and embedded programming use cases (e.g. allowing code or data to be placed into a custom section) and to serve as low-level building blocks for higher-level features (e.g. “linker sets”). The intention is that this attribute is to be used rarely by specific use cases; high-level application code should not need to use it directly and instead should rely on libraries, macros and other abstractions over the low-level attributes.

## Motivation

The [SE-0385 (Custom Reflection Metadata) proposal](https://github.com/apple/swift-evolution/blob/main/proposals/0385-custom-reflection-metadata.md) explains the need for testing frameworks to allow their clients to annotate their types and other declarations to make them discoverable and enumerable at runtime, as well as to let the annotations attach user-specified data to those declarations. However, [SE-0385 was returned for revision](https://forums.swift.org/t/returned-for-revision-se-0385-custom-reflection-metadata/63758) with the argument (among others) that static (compile-time) generation of the metadata should be preferred:

>*The runtime discovery mechanism relies on executing generator functions to produce run-time values for custom attribute metadata. This design precludes static extraction of custom metadata, such as a tool that extracts the set of available tests from a binary. The design should consider whether the generated metadata could be made more amenable to static extraction, as well as more efficient queries, by considering the interaction with constant initialization.*

This proposal is motivated by the same needs as SE-0385 (see the [Motivation section in SE-0385](https://github.com/apple/swift-evolution/blob/main/proposals/0385-custom-reflection-metadata.md#motivation)), and is taking an approach that addresses the above-mentioned rejection reason. However, this proposal isn’t trying to be a direct replacement of SE-0385, as it has a limited scope of the additions to only provide the lowest-level building blocks to produce custom runtime-queryable compile-time information, and also enables static / offline / out-of-process extraction.

This file has been truncated. show original

Any comments, questions and other feedback is welcome!

Kuba

Douglas_Gregor · February 5, 2024, 5:29pm

I like the unified @linkage spelling, and thanks for continuing to push this forward. A couple of comments and requests.

If you're looking for another use case in addition to the excellent @DebugDescription macro, swift-testing would use this to record metadata about each of the tests so that it can find them at runtime.

I feel like the feature "linker sets" isn't something that the document can rely on people already knowing, and your references to it aren't buying much. Perhaps this document could just focus on the linkage part and avoid "linker sets" entirely, noting only that extraction of information from specific sections is a separate issue?

I'm torn on some of the restrictions that are placed on the use of @linkage such as:

the variable must not be declared inside a generic context (either directly in generic type or nested in a generic type)

the initial expression assigned to the variable must be statically evaluable (see below)

The statically-evaluable requirement seems orthogonal to the specification of linkage. You could certainly want to be able to require that a variable be statically evaluable in cases where you don't need linkage (i.e., all you care is that there's no runtime initialization for the variable), and it seems plausible that you might not care so much about statically evaluable for other uses of linkage.

At the moment, static variables aren't even allowed within generic contexts, so I'm not sure you need to call out this limitation on use in generics.

I think we should pull weak into this proposal. We already have the underscored @_weakLinked attribute, so we'd be standardizing that. Weak linkage also applies to more than functions and global variables (it works for almost anything), so it will require some generalization in this proposal.

I'm somewhat inclined to want to pull mangledName (perhaps call it symbolName) into this proposal, again because it standardizes something that's existed for a long time as an underscored attribute (@_silgen_name), and is very much something one wants to do when in the realm of linkage specifications.

I find this to be unfortunate:

#if os(macOS)
@linkage(section: "__DATA,mysection")
#elif os(Linux)
@linkage(section: ".mysection")
#endif
var global: Int = 42

because (1) it's really easy to make a mistake and write a non-portable section name that's going to blow up on another platform (e.g., Windows requires very short names), and (2) the link between os strings and object file format is arcane knowledge. There are a couple of paths we could take here. We could add #if support for specific object file formats, e.g.,

#if objectFile(MachO)
@linkage(section: "__DATA,mysection")
#elif objectFile(ELF)
@linkage(section: ".mysection")
#endif
var global: Int = 42

which potentially would help other code that needs to look at the loaded image, such as the APIs to enumerate the entries in a given section that are mentioned in Future Directions. Or we could try to encode the information in the @linkage attribute itself, e.g.,

@linkage(section: [.macho:  "__DATA,mysection", .elf: ".mysection"])

which eliminates duplication of the attributes and allows validation of all of the names regardless of what platform you compile for, but doesn't help other code. If the right answer is #if objectFile, then that's a separate proposal, but I'd like us to consider what we want this code to look like.

Yeah, I think visibility control in particular would be easy to add and very appreciated by specific people.

This is one I would have said we could leave out, because we don't have existing underscored attributes to lead the way.

Doug

John_McCall · February 5, 2024, 5:36pm

Yeah, I think visibility control in particular would be easy to add and very appreciated by specific people.

FranzBusch · February 5, 2024, 6:53pm

One use-case that I would love to use this and the future directions that are mentioned here is to look up the .note.gnu.build-id to get the build id that has been generated by the linker. I often add that id to my log metadata to quickly see what build is in use.

Joe_Groff · February 5, 2024, 8:23pm

Global variables that aren't statically initializable are exported essentially as computed properties, since all access has to go through the one-time initializer. We can only expose the variable symbol directly if the dynamic one-time initializer can be eliminated.

I agree, though it would immediately invite a possibly-distracting bikeshed over the many different unrelated kinds of "weak" there are, and how to describe them so it's clear whether we're referring to "weak" as in "it's ok to resolve to null if the symbol isn't present in an older version of the module" as opposed to "weak" as in "please replace this definition with a 'strong' definition somewhere else if there is one". The latter could also be very interesting to embedded platforms, where we might want to let generic implementations of runtime stubs be superseded by the platform vendor's version if available.

This touches on something else: there isn't necessarily one symbol associated with a Swift declaration, and it isn't always desirable to have all the symbols share the same linkage controls. While some declaration may have an obvious "primary" symbol to control, even just top-level functions potentially have both a C entry point (which can currently also be controlled in some basic ways by the long-underscored @_cdecl attribute) and a Swift entry point. In addition to the symbol names for the Swift and C entry points, you may want to independently control the other linkage attributes of the entry points independently. For types especially, there are a whole bunch of metadata structures whose linkage, and mangling might be interesting to control. So it would be interesting to have a way to specify in the @linkage attribute which specific symbol related to a declaration is affected by the attribute, and to allow multiple attributes to modify different symbols related to the same declaration. Combined with the visibility controls John suggested, that would also finally allow you to export something only as a C function, or publicly as a Swift function while only internally available as a C function:

@linkage(for: swiftEntryPoint, symbolName: "_swift_fooBar", visibility: private)
@linkage(for: CEntryPoint, symbolName: "c_fooBar", visibility: public)
func fooBar(...)

For global variables without static initializers, you could potentially control the accessor's linkage even if the storage can't be exported.

tshortli · February 5, 2024, 8:47pm

For a couple more concrete examples, this function has both a main symbol and also an opaque type descriptor symbol:

public func funcWithOpaqueResult() -> some P { ... }

This function has 8 associated symbols(!):

public dynamic func dynamicFuncOpaqueResult() -> some P { ... }

I do think it makes sense to extend linkage control to all of the associated symbols for a declaration. Unless you are very familiar with the Swift ABI or compiler implementation, though, you may not be aware of all the symbol kinds that you have to consider and that makes me wonder if the compiler ought to help you ensure you've provided exhaustive coverage of the symbols with your @linkage attributes.

tshortli · February 5, 2024, 8:56pm

I'd like to bikeshed the use of the term used in this example:

// place entry into a section, mark as "do not dead strip"
// ...
@linkage(section: "__DATA,mysection", used)
let myLinkerSetEntry: Int = 42

Is "used" such a well established term of art for this purpose that it is the obvious choice? IIUC it aligns with the underlying LLVM concept, but I wonder if we could use a more explicit term in the surface language. For example I think something like nostrip could do a better job communicating the purpose of the attribute.

Joe_Groff · February 5, 2024, 9:03pm

I might be too narrowly focused in my thinking of use cases, but I think there are a fair amount of use cases where you really do only care about controlling one related symbol specifically—for instance, you want to export a C API, so you only care about exporting the C symbol, or you want to make a global constant available to some foreign code, so you only care about exporting the storage.

tshortli · February 5, 2024, 9:36pm

This is an admittedly narrow use case, but since replacing @_silgen_name came up earlier in the thread, one use case that came to mind for me was maintaining ABI compatibility in a dylib when evolving its declarations in certain ways. For example, @_silgen_name is used here to preserve the rethrowing map() ABI in the standard library whilst replacing it with a version that uses typed throws instead:

  // ABI-only entrypoint for the rethrows version of map, which has been
  // superseded by the typed-throws version. Expressed as "throws", which is
  // ABI-compatible with "rethrows".
  @usableFromInline
  @_silgen_name("$sSTsE3mapySayqd__Gqd__7ElementQzKXEKlF")
  func __rethrows_map<T>(
    _ transform: (Element) throws -> T
  ) throws -> [T] {

If map() were instead a function that has multiple associated symbols, we wouldn't even have the option of using @_silgen_name to do this. But if we were using the new @linkage attribute to do a similar ABI preservation, we could achieve the goal but we'd want to make sure we covered all the applicable symbols. That said, maybe it's enough to just have a complete, documented list of symbol types to choose from since this use case is already pretty expert-level.

Joe_Groff · February 5, 2024, 9:46pm

Ah, for the case of managing an API change without breaking ABI, I could see being able to exhaustively control the symbols for a declaration to be very useful. Would it help to have a catch-all syntax to mean "all symbols not explicitly mentioned by another @linkage attribute"? That could be useful in combination with some way to induce an error if the attribute matches an actual emitted symbol:

@linkage(for: foo, ...)
@linkage(for: bar, ...)
@linkage(for: *, invalid) // error if any other symbols associated with this decl are emitted

tshortli · February 5, 2024, 9:57pm

Yeah, I think that's a reasonable syntax. A second use case for exhaustive coverage that I just thought of is forward declarations. @_silgen_name is sometimes used in circumstances where it is impractical to actually import the Swift/clang module for a library that's used at runtime (circular dependencies, for example). You'd definitely want checked exhaustiveness if you were redeclaring a declaration from another module.

compnerd · February 6, 2024, 4:05am

I definitely am amongst those interested in visibility. It also raises the related but disjoint property of DLL Storage.

Currently, we do not have the means of exposing API symbols across module boundaries, that is, symbols from module M may not be exported through module N if M is built statically. We currently rely on static and dynamic linkage to infer the DLL storage. Having that level of control would be amazing, but requiring that through @linkage on each symbol seems like too much ceremony for something that common.

The separation of the notions of DLL Storage and Visibility is important for portability - there are some platforms (e.g. Playstation) which actually do support both simultaneously.

John_McCall · February 6, 2024, 6:59am

I don't know what you mean by that. PE dllimport/dllexport and ELF/MachO symbol visibility are fundamentally the same thing — they're just controls for which symbols end up in export tables and which symbol references are emitted in a way that allow resolution through an import table. The differences between them are mostly just the language models used by compilers. There are some significant differences in how forgiving the respective toolchains are about mismatches, but that forgiveness isn't sufficient to let us statically link multiple naively-compiled Swift modules into the same DLL, so if we ever want to do that, we need to understand those connections when compiling those modules. Once we have that, I don't think there's anything further that can be usefully expressed, because again, they're just different language models over essentially the same mechanisms.

compnerd · February 6, 2024, 2:52pm

They are somewhat related. Technically, the visibility attributes control how the symbol participates in dynamic linking (globally, locally, or globally without interposition), whereas the DLL storage only indicates where the symbol resides. Now, they both do end up impacting the export tables, but that doesn’t make them identical. In the case of PS, there’s a custom ELF loader and that supports both attribution simultaneously (not applied as equivalents of each other). While it has been a little while since I looked into this, I believe that is still the case. I think that we should have the ability to specify both if we expose this.

I agree with your idea that we should figure out how the symbols interact and model that as that is what we are after, but in that case, perhaps we do not define that in terms of visibility and DLL Storage but rather with something swift specific.

allevato · February 6, 2024, 3:59pm

The proposal draft says the following about support for structs:

Custom structs with a frozen layout and a trivial initializer are constant-foldable if all the stored properties only use other constant-foldable types and the initializer call uses constant-foldable expressions

Does/will this include structs imported from C? I don't recall if they're imported in such a way that the check for @frozenness would also fall out naturally, but since a @frozen Swift struct isn't guaranteed to have the same layout as an equivalent C struct, being able to define static data that has exactly the same layout as it would in C by directly using that imported struct would fill an important gap.

grynspan · February 9, 2024, 8:35pm

Something @kubamracek and I discussed off-forums was the need for a way to discover symbols at runtime. Darwin includes API for inspecting Mach-O binaries and finding segments/sections, so digging up "__TEXT,__my_great_section" is easy.

On Linux and ~~Windows~~, this isn't generally possible and the Swift runtime has had to use static constructors to cache section data it needs—but then there's no way to do that for an arbitrary section, so if e.g. swift-testing adopted @linkage, it would be unable to find the section it used when discovering tests.

Edit: I did some experimenting and it is possible to look up sections dynamically on Windows using Win32 API. It's a bit like the ending of Raiders of the Lost Ark, but it's possible.

Joe_Groff · February 9, 2024, 8:57pm

It seems like section control would at least still be a prerequisite to being able to do the link-time ordering of the metadata and generation of start/end symbols that those static constructors use, even if it isn't a whole solution unto itself.

John_McCall · February 9, 2024, 9:07pm

I would like to just have a general feature for this kind of passive discovery, if we can agree on what it would look like.

grynspan · April 30, 2024, 2:31pm

Coming back to this a while later: it actually is possible to do the requisite discovery at runtime on Windows. I have a proof of concept here (untested! no guarantees! caveat executor!)

Linux remains a problem as there is no high-level API for walking an ELF binary and writing custom code to do so is not a small task.

Joe_Groff · April 30, 2024, 4:32pm

Nice! If the PE section headers are reliably mapped into the executable, that should be sufficient. It would be nice then for the Swift runtime on Windows not to rely on Swift-RT.o file(s) to manually delimit the sections. (I think we'd still need the static constructor for DLLs loaded after process start if we don't want to use private NT interfaces to be notified on load.)

AIUI there's still a more fundamental problem there that the ELF headers are not normally mapped into the process while it's running.