[Pitch] Low-level linkage control attributes: @used and @section

kubamracek · June 28, 2023, 3:24pm

Hello, Swift Community!

Pitch

I'd like to make a pitch for adding two attributes to globals and functions in Swift that can be used to control exactly how are symbols (at the linker level) exported. Both the attributes have existed in the C/C++/ObjC world for a long time:

@used ... the analogue to GCC's/Clang's __attribute__((used))
@section("section_name") ... the analogue to GCC's/Clang's __attribute__((section("section_name")))

Motivation

The motivation has two goals:

To provide low-level building blocks for building more high-level APIs, e.g. "linker sets" (see below) or custom per-type metadata as described in SE-0385 (swift-evolution/proposals/0385-custom-reflection-metadata.md at main · apple/swift-evolution · GitHub). Though this pitch/proposal doesn't actually try to add or design the high-level APIs, just provide a path towards unblocking the design of them, separately.
To provide a low-level mechanism for systems programming use cases that are bespoke cases for concrete systems and building a generally reusable high-level API doesn't make sense (and the project author is free to build such a high-level API as an internal mechanism in their project).

The "linker set" mechanism is is an approach that Swift is already using: Nearly any kind of compiler-emited metadata is put into a specifically-named section in the binary and given a fixed-layout record. Then when we want to do a lookup for some information -- say, to find the protocol conformances in the binary -- we ask the loader (dyld on Darwin) to give us the start/end address of that section in each of the loaded images, and then you can iterate through all of the records in those sections. It's also possible to extract some of that metadata from outside the process, or dig it out of the binary itself. We do this with the existing reflection library in, e.g., swift-inspect and swift-reflection-dump.

This pitch suggests we add the ability into the Swift language to express the first part of this mechanism: Placing fixed-layout records into specifically-named sections.

Proposal

Some of this is already implemented under a feature flag as underscored attributes (@_section, @_used) in main via https://github.com/apple/swift/pull/65901. In summary:

@used attribute that flags as a global variable or a top-level function as "do not dead-strip" via llvm.used, roughly the equivalent of __attribute__((used)) in C/C++.
@section("...") attribute that places a global variable or a top-level function into a section with that name, roughly the equivalent of __attribute__((section("..."))) in C/C++.

The annotations can only be applied to globals that are guaranteed to end up as "statically initialized" (instead of lazily initialized via a init_once runtime call), because the annotation doesn't make sense otherwise. This opens an interesting question: What expressions are guaranteed to be "statically initialized" when used to initialize a global? The proposal is to start with a very basic set of expressions, and improve on that in the future. As of today, the mandatory optimizations pipeline already makes integer literals, tuples of them, and simple arithmetic expressions to be "statically initialized" and we can cleanly reject the compilation at the end of the SIL pipeline if there's any global with a @section attribute that's not statically initialized. Then as follow-up improvements, we should look at allowing POD struct types to also be handled in the mandatory optimizations pipeline and allowed to be used with @section.

While outside of the scope of this pitch, here's a sketch of what the runtime side of a "linker set" API could look like:

// in Module1
@used @section("__DATA,mysection") private let my_entry: Int = ...

// in Module2
@used @section("__DATA,mysection") private let my_entry: Int = ...

for entry in SwiftRuntime.section("__DATA,mysection", as: Int.self) { // this uses the loader's APIs to locate and iterate over the section
  ...
}

And eventually, it might make sense to wrap this into a macro-based solution so that we don't expose the low-level attributes at all:

@LinkerSet(name: "myLinkerSet") private let myEntry: Int = 42

for entry in SwiftRuntime.linkerSet("myLinkerSet", as: Int.self) {
  ...
}

Or, in the case we want to attach metadata to types (as motivated by SE-0385):

@Registered(name: "My Favorite Type") // this creates a hidden global in a named section
class MyType { }

for regType in allRegisteredTypes { // queries over the entries in the section
  ...
}

Thoughs?

ksluder · June 28, 2023, 3:43pm

Is it worth splitting the arguments to @section, perhaps @section("mysection", segment: "__DATA")? It seems strange to expose what is effectively linker-specific CLI syntax through the language, especially on platforms where one has a choice of linkers.

Joe_Groff · June 28, 2023, 4:55pm

In the pitch thread for @convention(thin) function pointers, I had raised the idea of incorporating these sorts of modifiers into a general @symbol attribute, which could potentially be applied multiple times to the same declaration, in order to do things like export both Swift and C calling convention entry points for a Swift function:

@symbolName(swift: "swift_foo", visibility: internal)
@symbolName(c: "c_foo", visibility: public, section: "__TEXT,__fooplugn", used)
func foo(...)

This might fit better within Swift's implementation model, where declarations don't necessarily map 1:1 to a symbol, and we may emit any number of entry points depending on how the declaration was used. Once you talk about multiple different symbols for a declaration, it seems like all of these low level controls, including exact symbol name, visibility, section/segment name, and used-ness, potentially become independently interesting for each entry point.

jrose · June 28, 2023, 5:13pm

I tend to agree with Joe. For @used in particular I’d like to see what modern use cases there are, given that Swift has explicit access control already—not saying it can’t be useful, especially for main executables, but want to see it explicitly. Relatedly, could choosing a section imply “keep this” in Swift, or are there uses for explicitly-sectioned symbols that can still be dead-stripped?

EDIT: I’m thinking now about linker sets and how the built-in ones we have are actually very poor at knowing when it’s safe to dead-strip. I don’t know if today’s linkers provide good solutions to that problem, though. (Example: it is safe to strip unreferenced conformances if (though not only if) the type descriptor of the conforming type is never referenced/public, because that implies that particular concrete type is never used dynamically. That’s “symbol A depends on symbol B”, which is a pretty simple condition, but I don’t think we attempt that on any of Swift’s platforms.)

You’ve also snuck in the idea of “guaranteed static initialization”. I think this is something plenty of people are interested in without section control, and it may deserve to be the centerpiece feature here!

jrose · June 28, 2023, 5:17pm

I agree but at the same time section and segment names have different restrictions on every platform, so at least some of this will end up platform-dependent anyway. I’m not sure the concept even maps to Windows (it might, I’ve done very little Windows)

ksluder · June 28, 2023, 5:25pm

This also seems much better from an identifier-explosion perspective.

fclout · June 28, 2023, 7:47pm

There's a few points I'd like to bring up:

@section overlaps with access modifiers, as evidenced by the fact @section("__DATA,__data") private let foo = 0 doesn't respect the spirit of private (people outside of the file can access the variable rather implicitly).
How do we enforce that @section members are all the same type?
For @section to make sense in a dynamically-linked world, it seems to me that we should somehow make sure people understand sections are per-binary/internal.
Does section only apply to variables in the global scope?
All symbols in the linker set need a name, which might be inconvenient if you want to stick an array of things in there without having to name all of them.
In the fullness of time, it could be nice if the requirement for participating in a linker set was that the type is final and has a constant-evaluatable init instead of leaving out reference types entirely. We already have reference types instantiated with static storage in Swift binaries. This does create new and exciting questions, though, like whether the linker set contains references to the objects or the objects themselves, and what that means for mutability if the segment permissions do not agree with let/var.

I think that as proposed, @section and @used are the bare minimum viable way people can build linker sets. I'd like to pitch the other extreme, linkerset as a first-class language construct:

linkerset ReflectionMetadataSet: ReflectionMetadata {
	let nsObjectMetadata = ReflectionMetadata(name: "NSObject", {...})
}

// in a different file...
extension ReflectionMetadataSet {
	let nsStringMetadata = .init(name: "NSString", {...})
	let _ = .init(name: "_anonymousClass0", {...})
}

func printAllClassNames() {
	for md in ReflectionMetadataSet.allValues {
		print(md.name)
	}
}

The visibility of linkerset can be anything, but extensions are only allowed in the module declaring the linkerset. If nobody uses it, it can be dead-stripped in its entirety (and if nobody uses allValues, entries not directly accessed can be removed as well).
The actual name of the section is decided by the compiler, but it would make sense to allow it to be overridden with @_silgen_name or a formalization of it.
You add new elements by extending the linker set with extension.
All elements in the set must be ReflectionMetadata objects, as specified in the "inheritance" clause, similarly to how enums work.
(ReflectionMetadata must be a struct for the purposes of this example, as there's a lot more to think for classes).
The let _ syntax creates an anonymous entry, that is an entry that exists in allValues but that can't be referred to by a symbol name, like ReflectionMetadataSet.nsObjectMetadata; private entries could serve the same purpose, but in some cases you might just not care to give the symbol a name at all.
It stands to reason that linkerset can be embedded in other namespace (that are not generic types, for the same reasons Swift doesn't have stored static properties on generic types).

Of course, whether linker sets are important enough for actor-level of first-classness is up to debate. I'm proposing this more for the value of the exercise in the maximalist direction and I imagine that what we actually need is somewhere in the middle. As a matter of fact, now that I've written all of this, linkerset feels like a special kind of enum, so there might be something interesting to look at in that space that is less work than a whole new thing and that has better outcomes than bare attributes.

kubamracek · June 29, 2023, 3:59pm

Agree with your description that this pitch is a bare minimum option, and I actually do think that there should be a high-level API / construct for linker sets. I don't view these two options to be conflicting though, actually quite on the contrary: The low-level attributes are trying to be a path towards building a high-level solution that has the properties you seek (strict type checking). Two reasons why I think the low-level attributes are the right path forward:

There are more use cases for the attributes than just linker sets. On some platforms startup code or special CPU mode code must be in specific sections. Special platform data structures (interrupt vectors) must be in specific sections.
One of the major reasons why SE-0385 was returned for revisions was that the solution isn't generalizable, and suggests using macros on top of something more general.

With that, I don't actually expect (or want) widespread use of these low-level attributes directly. Just like e.g. @_silgen_name, it's an advanced mechanism only suitable for specialized needs, and most software projects shouldn't ever need it. (This isn't an argument that we shouldn't get the design of it right, of course. It's an argument that the audience I'm targeting is stdlib contributors and systems programmers, rather than high-level projects authors).

kubamracek · June 29, 2023, 4:03pm

Also let me CC some more folks that I think might find this pitch relevant: @Douglas_Gregor @Erik_Eckstein @hborla @rauhul

Dave_Lee · October 9, 2023, 8:09pm

We have a pitch that will need both @section and @used. See Pitch: Debug Description macro.

The pitch doesn't mention @used, but I presume it will be necessary to prevent the results of macro expansion from being subsequently stripped/elided.