Plan for module stability

jrose · July 17, 2018, 11:17pm

As you can probably tell, I've been saving a lot of discussions, waiting for Swift 4.2 to be in a pretty good place before we jump all in on Swift 5. This one's one of the main things I plan to be working on in the next year or so, so I wanted to make sure you all had the big picture. It is another massive Jordan Rose post, so if you want the TLDR, just read the bolded sentences in each section and then jump down to the plan at the end. Thanks!

Introduction

ABI stability means that an executable compiled against Swift 5 will work with the Swift 6 libraries, and that an executable compiled against Swift 6 will work with the Swift 5 libraries. A related concept is module stability, which says that the interface for a Swift 5 library will work with the Swift 6 compiler. (The opposite direction is less interesting.) More generally, the interface for a library should be forward-compatible with future versions of the compiler. This is useful in a number of ways:

Can test a new compiler without rebuilding all of an app's dependencies.
May overlap with work to make the debugger work across Swift versions.
May help reduce incremental build time by better tracking cross-target dependencies.
Support for general non-resilient binary frameworks. (More on what this means below.)

Proposed Solution

C accomplishes module stability through source stability, by using manually-written header files to represent a library's interface. Swift can do something similar by printing a type-checked AST to a textual form and including any extra information needed to reproduce the original compilation environment (such as the deployment target). To avoid the cost of loading this textual form, the compiler will keep a cache of library ASTs it has seen, serialized in the current binary format.

This interface will only contain the public and open parts of a library, plus any parts that are marked as inlinable or are made available as part of the library's ABI. (For libraries compiled from source, this will include the layout of structs and enums, for example.)

	C-based Languages	Swift 4	Swift + Module Stability
Source files	.c, .m, .cpp, ...	.swift	.swift
Interface files	.h	.swiftmodule	.swiftinterface (new)
Interface is	manually written	generated	generated
Interface contains	public API, inlinable function bodies	all API, inlinable function bodies	public API, inlinable function bodies (see below)
Distribution format	textual	binary	textual
Binary format for faster import	.pcm (in module cache)	N/A (already binary)	.swiftmodule (in module cache)
Language version	Chosen by client	Stored in interface	Stored in interface
Platform / deployment target	Chosen by client	Stored in interface	Stored in interface
Respects -D flags	Yes	No	No
Affected by search paths	Yes	Yes	Yes

Inlinable Code

Like C, the plan for inlinable functions is to copy their bodies verbatim into the interface file. This isn't a perfect answer, since it leaves the interface more vulnerable to perturbations in type checking, but it does rely on the same source compatibility mechanisms Swift is already using, rather than forcing us to commit to a stable version of SIL (the intermediate representation used for high-level optimizations that's stored in swiftmodule files). When the textual interface is loaded, these inlinable functions will be compiled to SIL and cached.

Configuration Conditions (`#if`)

C headers handle platform and user conditions by guarding sections of source with preprocessor macros. Swift, however, has to generate its textual interface, and configuration conditions are resolved well before type-checking. This implies that the generated interface file will be platform-specific. Rather than attempt to merge several platform-specific interface variants back into a single file, it's probably simpler to distribute a folder containing one interface per architecture, or possibly per OS/architecture combination. The former is how binary swiftmodule files work today.

Note that this is an outstanding issue for the generated Objective-C header. The Swift compiler is invoked once for each architecture, but only one header gets copied into the build product, chosen arbitrarily by Xcode. This is one of the reasons why this feature is not supported by the package manager.

Configuration conditions that are not based on the target don't really fit into this model, including user-defined conditions (-D) and Swift language version checks (swift(>=5)). These conditions are not required to be the same across a library and its clients, and thus should continue to be excluded from the interface. A developer who wishes to create different versions of their library using -D flags should give each version a different module name or ensure that they are never used in the same environment, as they must do today.

Library Evolution ("Resilience")

With ABI stability and module stability, most of the pieces will be in place to build distributable binary frameworks that aren't tied to a single compiler version. Compared to Objective-C frameworks, however, there's still one piece missing: support for library evolution, or resilience. This is the feature that allows you to change a framework in a backwards-compatible way without having to recompile a client application.

A good chunk of resilience has already been implemented in Swift 4.2, and is being tested in the standard library and SDK overlays on Apple platforms. Once the standard library is shipped with Apple OSs, Apple will need to be able to ship new versions of the stdlib without breaking existing apps. But it's not necessarily sufficient for libraries not shipped with an OS just yet:

There's no tool that will tell a developer when an ABI-breaking change has been made.
Features like @_frozen and @_fixed_layout that the standard library is using haven't been formalized for general use.
Clients may want to check the version of a library to see if a particular feature is present, which implies having some version of @available that isn't tied to OS versions.
The compiler, runtime, and debugger all still have known issues or unimplemented features when working with resilient libraries.

While the lack of full resilience support will not preclude making binary frameworks, developers who use them would need to recompile their apps when a new version of the framework comes out. There is also one tricky case: if binary framework ABCKit depends on binary framework XYZKit, and XYZKit changes, ABCKit will need to be recompiled as well. It would be great™ if there was a way to detect this mismatch when compiling or linking the downstream client. (I don't have any concrete ideas yet.)

Alternatives

(discussed here)

Use a binary format

It would be possible to use the existing binary format as the stable interface for a module (or something based on the existing format) rather than use a source-based format. This could be a lot simpler, since it's how the existing module-import code works. However, it has a number of downsides:

More difficult to inspect, compare, and test (requires a dump-to-text step).
More difficult to debug when things go wrong (because an invalid binary archive won't dump properly)
Encodes implementation details of the AST
Requires establishing a stable subset of SIL or embedding the source of inlinable functions into the binary format
Still needs a "check" phase on import to ensure that dependencies haven't changed in an incompatible way

In practice, it seems like a binary format would still require a fair amount of work while having unfortunate drawbacks, and it would be harder to maintain in the long-term.

Use a non-source format for function bodies

It's possible that some users would object to the source of their function bodies being displayed verbatim in the textual interface file—what about secrecy? However, this is equivalent to inline functions in header files in C: the function body needs to be serialized in some way in order for the client to inline it, and using the same format as regular source is what lets us lean on source compatibility for forward compatibility. (It's worth noting that only code the developer explicitly marks as "inlinable" or uses in a default argument will be included in the interface files.)

Eliminate swiftmodule files (except in the cache)

The existing binary swiftmodule format is still useful for a handful of reasons:

It's still used for debug info, which requires information about all types in the module (even private and local types).
The initial design for swiftinterface files only contains public APIs, so they're not sufficient for unit tests that use @testable import. This wouldn't be too hard to add to the design, but it still means larger textual interfaces when building with -enable-testing. It's therefore not an initial priority.
Textual interfaces provide little advantage for libraries built from source in the same development environment (the common case for the Swift Package Manager). An eventual cross-module optimization mode would likely benefit from being able to share arbitrary, compiler-specific information across module boundaries, something that textual interfaces probably shouldn't designed to do (at least at first).

High-level plan

Hook up ASTPrinter to a new command-line option, -print-interface or similar, to produce .swiftinterface files.
Teach Swift to "compile" interface files into the existing binary format, so that they can be loaded the same way they are today.
Turn the above into an on-demand "module cache" like Clang. (The "on-demand" part may not actually be a good idea, but it'll make it easier to test this without having to modify existing build systems too much.)
Add support for ABI details that aren't in the normal interface (like private struct fields).
Lots of testing against real projects.

jawbroken · July 18, 2018, 12:06am

The only question I have is whether “the plan for inlinable functions is to copy their bodies verbatim into the interface file” means that comments in the function body will be included as well. I think this is the most likely avenue for private information to accidentally leak, and I don't see any significant downside to stripping them out.

jrose · July 18, 2018, 12:08am

That sounds reasonable. It's a little annoying to implement because they can be anywhere in the source, but it's in line with stripping out the inactive branches of #if, which we'd probably want to do.

EDIT: But I might not do it in the first implementation, because there's always a client-side workaround. :-)

allevato · July 18, 2018, 12:15am

This sounds great from a usability/debugging point of view!

Similar to the question above, would the generated .swiftinterface contain the documentation comments for its declarations (perhaps in a structured representation more suitable for tool processing instead of the raw Markdown)? Since it would hold all public/open decls, it seems like the file could act in that capacity as well. And if so, could this subsume the .swiftdoc files that exist today?

jrose · July 18, 2018, 12:40am

I…remember discussing this with @Xi_Ge and @akyrtzi but I can't remember what the conclusion was. We certainly still want a compiled form like swiftdoc for quick access, but that could be part of the cache like swiftmodule.

I do remember that one of the reasons swiftdoc is separate from swiftmodule today, though, is so that changing the docs doesn't result in all the downstream sources having to be immediately rebuilt. That's actually a questionable decision if the downstream sources have docs too, since some of those docs might be inherited, but still.

akyrtzi · July 18, 2018, 1:05am

We were leaning towards keeping the swiftdoc as part of the stable module, to elaborate:

The swiftdoc format is quite simple and historically has been stable (very very few changes), we believe we will be able to keep it in a stable format.
Replacing it by putting the doc-comments in source will be disadvantageous because we will lose flexibility on things that swiftdoc contains now (and possible in the future) that we would have to find a way to represent in "source form".
- For example, right now the swiftdoc keeps track of the groupings of the Swift StdLib. We were able to implement the feature without being forced to design and implement a way to define the groupings via source form.
- In the future, a potential improvement is that a framework could have multiple swiftdoc files for multiple languages (or have each translation of the documentation comments in one swiftdoc).

In general, I think the concept of separating doc-info related info from the .swiftmodule and .swiftinterface files provides valuable flexibility, I'd prefer not to try to stick everything into source form in a .swiftinterface file.

since some of those docs might be inherited

We don't copy documentation comments in downstream modules, an inherited doc-comment would show up as part of following the inheritance chain.

Chris_Lattner3 · July 18, 2018, 5:18am

As I think you know, I'm a huge fan of this approach. We've put a lot of work into source level stability, and this nicely leverages it. This also leverages the existing "generated interface" work nicely, and forces it to be round-trippable, which is also great.

I agree that there is a concern that the bodies of inlinable functions may not parse due to future language changes, but I think this is an acceptable risk. When parsing one of these interface files, I think it is reasonable to run the parser in a special mode that accepts and ignores bodies of functions that it doesn't understand (just skip to the end brace, perhaps emit a warning, then keep going). This makes a source break merely a performance hit, instead of outright breaking the separately compiled library.

AliSoftware · July 18, 2018, 9:46am

Interesting!

Also, if I understand correctly, this plan would also start the groundwork for enabling (in the long run / distant future) the possibility for a tool to auto-semver libraries & packages, by detecting unchanged vs purely-additional changes vs breaking changes in public APIs between current and last version of a module, and bump versions accordingly during a new release… right?

jrose · July 18, 2018, 4:28pm

I personally think a good auto-semver tool would want to run on a type-checked AST anyway, so that it can know, for example, that using a typealias instead of the underlying type is not a breaking change. But yes, you'd be able to run source tools over the swiftinterface file rather than having to get it out of the binary format.

sergiocampama · July 19, 2018, 1:52am

Could this swiftinterface file be faster to generate than a swiftmodule file? The reason I ask is because with some build systems we can parallelize the compilation of Swift modules across multiple machines, but we currently don't gain that much as the the swiftmodule file is an artifact of a compilation step. And so dependent modules will need for the lower level dependencies to finish compilation before they can start being compiled, which increases the critical path when building.

With C based languages we get around this as we have all the interfaces available as header files that are available before compilation even starts. With Java it's possible to generate such an interface to increase parallelism.

If this effort to create a new swiftinterface file could be made in such a way that these are way faster to generate than compiling the sources, this could be a huge benefit for distributed build systems.

(Disclaimer: I know it's possible to decouple swiftmodule generation from compilation, and I haven't tested whether swiftmodule generation by itself could be faster, but according to Tony Allevato, it should not bring many benefits.)

huon · July 19, 2018, 2:10am

Yes, but such optimisations are orthogonal to the output format, as we could have a mode that makes swiftmodules (just as) fast to generate too, by skipping type checking etc. for the bodies of non-inlineable functions. This removes all interaction with the constraint system and expression type checker for such functions, and those can use a significant amount of CPU time, depending on the project. In either case, this would have to be opt-in, because it means errors in that code wouldn't be caught.

It is moderately faster, because it skips interactions with LLVM and everything after that (e.g. no low-level optimisations, no code generation, and no linking). The standard library's build system currently uses this scheme to provide more parallelism, by compiling the swiftmodule and the object files/dylib separately.

John_McCall · July 19, 2018, 2:21am

The swiftinterface file will have essentially the same information as a swiftmodule file, just in a stable, textual format. Since the swiftmodule format is less constrained than the swiftinterface format, it should always be at least as efficient to use: if it's ever more efficient to use the swiftinterface format than the binary format of a swiftmodule, we'll just abandon the binary format.

As Huon says, we can definitely find ways to speed up the generate of a module description, but that's independent of the format of that description.

allevato · July 19, 2018, 3:30am

To elaborate on this, my concern for Bazel's Swift support has been whether splitting module and object generation into separate actions—which would each need to parse and typecheck N sources (and where typechecking can sometimes have its own performance issues)—would end up being an improvement over what we're doing now: using a JSON output map to produce all the artifacts we want in a single action. But the latter, as @sergiocampama mentions, means that downstream targets need to wait for full codegen when they otherwise might have been able to start building as soon as their .swiftmodules were ready (stopping after typechecking and some SILGen).

@huon (and others in this thread), would you say that the overhead of optimization and codegen is high enough that we'd see benefits of increased parallelism by separating out the .swiftmodule-generating actions from the ones generating the objects? This could help us in distributed builds where the actions are executed on completely different machines, but I'm less sure whether it would help or hurt in the more core-constrained local development case.

If it is the case that parallelizing them yields improvements, then I imagine .swiftinterface files would add a slight boost on top of that because it would remove the SILGen step from those actions as well.

jrose · July 19, 2018, 4:34pm

swiftinterface doesn't actually remove the SILGen step, because we don't want to include code that has bugs in it (like failing to initialize a variable). But both serialization formats would allow you to only SILGen and check inlinable code, rather than all of it, which is an optimization we don't do today.

Karl · July 19, 2018, 5:05pm

+1 for this.

I started wondering about this a while ago when we were discussing removing the Playground Quicklook APIs from the standard library (SE-0198 — Playground QuickLook API Revamp).

I gather that with this solution, we should be able to generate a swiftInterface containing the Playground Quicklook types/protocols, which would be available at compile-time in all contexts (regardless of whether your environment actually has an implementation of the framework available to load at runtime).

virl · July 20, 2018, 2:46pm

@jrose So, instead of implementing rock-solid and stable cross-library barrier/mechanism (like C DLLs, Objective-C Frameworks and Java Libraries have) you decided to just dump library compilation settings into text file?

In other words, instead of implementing expandable ABI standard you chose as paradigm constant creation of undefined number of binary compatibility layers.

In my opinion, it is the worst decision possible for binary frameworks. Even implementing constrained sub-syntax for framework's public API is better, because it would not break silently and due to unobvious reasons.

Inclusion of sources of inline functions into framework public API just shows that you chose completely wrong and insane paradigm.
Because the whole point of binary frameworks (and stable binary ABI) is to be INDEPENDENT of compiler, not require one to be able to be runtime-linked with the app.

jrose · July 20, 2018, 3:55pm

I don't think there's a reason for that tone, but I'll address the concern anyway: modules are only used at compile time. As shown in the table above, both the binary swiftmodule files we have today and the proposed textual formats are analogous to C/Objective-C headers (which do contain inlinable code as source), not to the DLLs / dylibs. The effort for compatibility across compiled code versions is ABI stability, and that's well underway.

virl · July 20, 2018, 4:04pm

Yes, sorry for the tone.

I'm just disappointed to realise the upcoming impossibility of binary libraries ecosystem (like Maven) for Swift due to discussed solution.

The whole problem with it is that it solves "ABI stability" by clashing together undefined number of old compilers just to make the linked libraries work.

It's to proper ABI standard is what Apple's Bitcode to JVM bytecode.

C/Objective-C headers containing inlined functions are just artefact of C era. True library interoperability / stability in industry relies on dynamic linking — be it linking of platform-specific binaries (Windows DLLs, for example) or binaries for virtual machines (Java, C#).

Basically the result of your proposal will make Swift binary library ecosystem extremely fragile outside of apple platform and tools. Detecting ABI-breaking source code changes via linter/tooling is inline with that.

jrose · July 20, 2018, 4:48pm

I'm not sure how this solution is different from DLLs. Can you elaborate?

John_McCall · July 20, 2018, 5:08pm

I think you’ve completely misunderstood what’s being suggested here if you think it involves distributing old versions of the compiler and giving up on binary interoperation.