TBD emission without whole-module optimisation: merge using external process, or emit during merge-modules?

huon · June 29, 2018, 4:43am

Background: a TBD file is a text-based stub, that lists all the symbols of a dynamic library, that can be linked against in leui of the full dylib. This allows more parallelisation in the build system, allowing linking of libraries to proceed in parallel, even if they have dependencies between them.

Currently swiftc supports generating TBD files with -emit-tbd, but this only works in whole-module optimisation mode: in an incremental build, a particular frontend invocation only sees the symbols in a single file, and thus cannot emit the complete TBD file. There's two possibilities for addressing this:

emit the TBD file only when the full AST module can be seen, meaning either during a WMO build, or during the merge-modules step of an incremental one
emit the TBD files per-file, and invoke an external tool (tapi) to merge them into the final output, in parallel with merge-modules and/or linking.

The first is slightly less incremental, since the merging of TBD files and merging of modules cannot happen in parallel. However, emitting the TBD file is not an expensive step, even compared to the limited -merge-modules step, which is already a serialization point in an incremental build of a library (and, note that even merging them with the external tool as above is a serialization point too). For instance:

generating the TBD file (which contains ~12000 symbols) for the whole stdlib, in WMO mode, takes 35ms. The rest of the build takes 14s (WMO -Onone) or 60s (WMO -O).
generating the TBD file as part of -merge-modules for swiftpm's Basic module (which seems to be the largest in swiftpm) takes 3ms, while the whole -merge-modules invocation takes 210ms.

(These are with a release no-asserts swiftc, without LTO.)

The second is similar to how each frontend invocation emits separate object files and swiftmodules, and then the driver invokes ld or swift -frontend -merge-modules after doing the normal build steps. However, unlike ld and swiftc, tapi is not in the path on Apple systems by default, and, AFAIK, doesn't exist on any non-Apple systems (although I don't know if any linker supports TBD files on non-Apple systems just yet). Going with this approach makes it more difficult to emit TBD files "by hand", such as in a non-Xcode build system, or when debugging a problem: the build system and the developer (respectively) has to reconstruct the appropriate path to tapi. Additionally, there's not a fundamental performance gain for incremental builds: it is still a serialisation point, and -merge-modules still exists and needs to run.

As such, I'm inclined towards avoiding the complexity of an external tool and going with approach 1. However, I'd like to just double check that I'm not missing something: are there downsides to the first approach other than adding slightly more work to a serial step? Are there upsides to the second approach beyond performance?

(Apple people: this is a public post, so if there's some internal reason please email me instead.)

David_Ungar2 · June 29, 2018, 8:08pm

I'm not that familiar with TBD files, so take the following with a grain of salt.
Because #2 would involve an external tool, I think it would have more potential for bugs, since the tool could get out of sync with changes to the compiler. So, I tend to favor approach 1, just as you do.

huon · July 2, 2018, 12:12am

I think there's quite a few reasons that that particular concern is not too large (e.g. there's a defined and versioned format that swiftc emits and tapi consumes, and, tapi is essentially just pasting some text files together).

However, I take it you don't see there being any hidden hazards for incremental builds (e.g. maybe you folks thinking about this space have plans for -merge-modules)?

David_Ungar2 · July 2, 2018, 12:16am

I’ve not thought this through. Graydon may have.

jrose · July 2, 2018, 5:32pm

If all information about public symbols can be derived from the merge-modules step, approach #1 sounds good to me. I wouldn't be too surprised that's not the case today, though. (Some tricky cases to think about: Clang decls in the bridging header, local classes that subclass NSObject, synthesized main functions.)

Slava_Pestov · July 3, 2018, 5:18am

I don't know about the others, but I think local types do not result in public symbols, even if they're @objc.

huon · July 13, 2018, 12:11am

What do you mean by this? I thought that we essentially didn't support defining symbols in the bridging header and that anything in C would be handled by tapi?

This seems to work, but thanks for prompting me to check.

The only problem I've noticed is that a property with an initialiser doesn't record that it has an initialiser into the temporary swiftmodules, meaning the merged one doesn't know it and that symbol is missing. It seems a bit unfortunate that that symbol is required, since subclasses in other modules will go through the superclass's init rather than calling property initialisers directly. I guess it is handling the case that a class init is inlined?

jrose · July 13, 2018, 12:52am

Anything with a tentative definition will still show up, including Objective-C protocols. That's probably fine for now, where we don't need to be exhaustive, but we'll need to be careful later.

I wouldn't actually expect that symbol to be required unless the struct is @_fixed_layout, and not at all for classes. @Slava_Pestov?

huon · July 17, 2018, 11:04pm

I chatted to Slava and the conclusion was that it is designed for inlineable inits, and that changing the behaviour isn't a great plan (e.g. the field init could reference private things, and so, in general, can't just be made inlineable itself).