[Pitch] Module Aliasing

elsh · September 2, 2021, 11:11pm

As Swift libraries and packages are more widely distributed, module names sometimes end up clashing. As there’s no module namespace yet in Swift, libraries are often forced to be renamed or pinned to an older non-conflicting version in such case. This makes use cases such as the following challenging:

Adding a new dependency or upgrading as it can introduce a collision: A new (or upgraded) module can have the same name as another module already in the dependency graph. Module name Logging is a common example.
Upgrading to a newer version of a package from an older version pinned by an upstream library: Consider a scenario where MyApp depends on module Lib, and Lib depends on module Logging. MyApp also depends on Logging. If Lib is pinned to Logging 1.0.0, MyApp is stuck with the same version 1.0.0.

Proposed Solution

We believe that module aliasing support will provide a more systematic means to address the above challenges so that manual labor of renaming or source code changes can be avoided.

In the above scenarios, module aliasing could rename the module Logging to a unique name, by potentially including its author name, e.g. VaporLogging, or by including a version number, e.g. Logging_1_0_0, so that a collision is resolved under the hood. Keep in mind that this will create new physical modules with different names but mapped to the original name.

This pitch focuses on the concept of module aliasing itself and the general end-to-end flow with new compiler flags designated for aliasing.

What’s Not Covered

The exact criteria for aliasing (including when it’ll be enabled, what name it will be set to, how to determine uniqueness of the name, etc.) will be discussed after the general flow has been determined.
Any potential new syntax such as import Lib as MyLib or modulealias Lib = MyLib is a topic for a separate discussion, and is not covered in this pitch.

Now consider the following example, where module MyLib imports module Logging.

[MyLib]

import Logging

func start(arg: Logging.MainLogger) { ... }

[Logging]

public struct MainLogger {
    public func log(_ arg: Logging.Verbosity) { ... }
}

public enum Verbosity { ... }

If there are multiple modules named Logging, module aliasing will perform (1) renaming all (or just the newly added) modules named Logging and (2) building MyLib with one of them specified via a flag, as follows.

Rename Logging and Build

Logging will be built with a new name. Let’s call it RealLogging. A new flag will also be passed, as follows:

swiftc -module-name RealLogging -module-alias Logging=RealLogging

Note that the value for -module-name should be RealLogging, not Logging. This will be the name of the physical module on disk (e.g. path/to/RealLogging.swiftmodule).

A new flag -module-alias Logging=RealLogging is introduced. This will treat the left side value as an alias and the right side value as the underlying module, analogous to typealiases (e.g. typealias S = String). In this example, Logging will become an alias, and RealLogging will be the underlying module. This will allow any (typed) references to Logging in source code (e.g. Logging.Verbosity in the above example) to be mapped to RealLogging . Note that passing in values in the wrong order (-module-alias RealLogging=Logging) will result in an error.

When encountering Logging, name lookup should find RealLogging as the underlying module, and all mangled symbols should contain RealLogging as the module name.

Tools for compiling IB files, asset catalogs, etc, that are triggered by the build system will get the physical module name as its module name value (which is the value of -module-name above, so RealLogging).

The final built product should be path/to/RealLogging.swiftmodule (or .swiftinterface, .framework, etc).

The underlying module name RealLogging will appear in APIs in RealLogging.swiftinterface.

The above steps will be performed on the remaining modules named Logging (with different names).

Build MyLib

When building module MyLib, an aliased module should be specified to indicate which one of the Logging modules should be imported into MyLib. If it’s RealLogging from the above example, the flag -module-alias should be passed in:

swiftc -module-name MyLib -module-alias Logging=RealLogging

This will treat Logging as an alias and RealLogging as the underlying module, similar to above.

The source code should only contain the alias, Logging; explicit use of RealLogging should result in an error. This makes mapping the alias to yet another name easier, e.g. -module-alias Logging=OtherLogging. Diagnostics and fix-its will also display messages containing the alias, Logging, for consistency.

Under the hood, however, references to Logging will be mapped to RealLogging as follows.

When resolving an import statement import Logging, RealLogging.swiftmodule will be loaded instead of Logging.swiftmodule . This requires the underlying module to physically exist on disk, e.g. path/to/RealLogging.swiftmodule.

When encountering Logging in source code, name lookup will find RealLogging (from an alias map created during module loading).

Mangling Logging.MainLogger will result in _$s11RealLogging10MainLogger instead of _$s7Logging10MainLogger.

The underlying module name will be stored in debug info and index-store, and treated as the source of truth.

Generated interface, MyLib.swiftinterface, will contain the underlying module name in import statements as well as the APIs, e.g. import RealLogging / func start(arg: RealLogging.MainLogger).

Caveats

There are some limitations as follows.

Module aliasing support will be limited to pure Swift modules only; no ObjC/C/C++ or @objc(some_name) as these symbols will collide.
It will also be limited to libraries built from the source (no distributed binaries) due to the impact on symbol mangling.
Runtime calls to convert String to module (direct or indirect) such as NSClassFromString(“Logging.MainLogger”) will fail and should be avoided.
There will be a higher chance of running into (existing) issues like the following:
- Retroactive conformance: this is not a recommended practice and should be avoided anyway (this adds yet another reason)
- Extension member “leaks” (example)
Code size increase will be more implicit; module aliasing will be opt-in and a size threshold could be set to provide a warning but users will need to be mindful of a potentially rapid growth.

allevato · September 3, 2021, 12:11am

In the case of this invocation:

swiftc -module-name RealLogging -module-alias Logging=RealLogging

This can only happen if you're rebuilding the module that you want to alias, but what if it isn't possible to rebuild that module from source later, at the time that you know what you want to alias it to? This could happen for a variety of reasons: the build system doesn't let you push information like that down the graph (Bazel), or it's a pre-built binary framework/module rather than a source-based package that can be rebuilt on demand. The first case is solvable (though not ideal) if you permit some global alias mapping to be provided, but in the second case, there's no way to rebuild the artifact if the source isn't present.

As Swift becomes used more in larger scale projects, its flat module namespace is definitely going to be problematic, so I agree that we need some capability like this. But I think to be workable, module aliasing needs to be something that can occur at higher levels of the build graph without requiring that the module being aliased is recompiled.

There are problems that that won't solve, of course; symbols could definitely collide at link time if you had two modules Foo with the same symbol/type declared in them, even if they were aliased at a higher level. I don't know a good way around that other than rebuilding everything from source with new names, unfortunately.

beccadax · September 3, 2021, 1:57am

allevato:

In the case of this invocation:
swiftc -module-name RealLogging -module-alias Logging=RealLogging
This can only happen if you're rebuilding the module that you want to alias, but what if it isn't possible to rebuild that module from source later, at the time that you know what you want to alias it to?

I think that's simply a limitation of the approach. Retroactively adding isolation to a module that wasn't built with it seems like it would be extremely difficult. You could, of course, design your build system to proactively alias all modules, adding version numbers or vendor prefixes to everything in case some particular client needs to link something that will conflict with it; if you did, this feature would suffice.

If the goal is to support multiple versions of the same module, symbol collision is not only possible, it is nearly inevitable, because two different versions of the same module will usually share a lot of API surface. So any approach that did not cause the two modules to have their names mangled differently is probably useless for the versioning use case.

(This is why the proposal specifically mentions retroactive conformances—not because duplicate retroactive conformances are a new problem, but because in the module versioning use case any retroactive conformance is likely to be duplicated, so any downcast involving that conformance is likely to be ambiguous.)

allevato · September 3, 2021, 2:16am

Absolutely—I suppose I'm thinking less about the situation of different versions of the same module, but instead complicated build graphs (via package dependencies or something else, like Bazel used in a large monorepo) where two subgraphs just happen to pull in unrelated modules with the same very generic name, like Utility. I think your proposal for public vs. non-public imports helps in those cases if the badly-named module doesn't need to be a public import, but there would still be the possibility of symbol collisions at link time (less commonly than two versions of the same module, though).

If the envisioned use case for this is to prevent conflicts between different versions of the same module, should we formalize that as a concept that could be encoded as part of the module and symbol mangling, rather than relying on free-form module name aliasing?

elsh · September 3, 2021, 10:24pm

This can only happen if you're rebuilding the module that you want to alias, but what if it isn't possible to rebuild that module from source later, at the time that you know what you want to alias it to? This could happen for a variety of reasons: the build system doesn't let you push information like that down the graph (Bazel), or it's a pre-built binary framework/module rather than a source-based package that can be rebuilt on demand. The first case is solvable (though not ideal) if you permit some global alias mapping to be provided, but in the second case, there's no way to rebuild the artifact if the source isn't present.

We could have an aliasing attribute set per dependency, so if there are modules (prebuilt or not) that are already built/cached in a dependency graph, no aliasing attribute needs to be set, but will need to be for the newly added modules with the same name. The new ones will have to be built from the source but once with aliases, and as long as they are unique, they won’t need to be rebuilt often.

If the envisioned use case for this is to prevent conflicts between different versions of the same module, should we formalize that as a concept that could be encoded as part of the module and symbol mangling, rather than relying on free-form module name aliasing?

How the uniqueness is determined isn’t discussed in this pitch, but one idea is to combine the author name in the prefix or use a reverse dns style (similar to bundle id). We can also add versions to the name since versioning is another use case. Combined, we’ll have something like GoogleLogging_1_0_0; this will most likely be calculated based on new attributes introduced in manifest or build settings rather than provided manually.