Improving import notation for very large and subdivided projects

I'm posting this in Development > Compiler instead of Evolution > Pitches because I'm not sure what shape the feature I want should take—I have a few competing ideas, so I want to have a more focused implementation discussion because I don't yet have a specific concrete feature that could be pitched.

Motivation

Our use of Bazel encourages projects to be split up into many fine-grained modules instead of larger omnibus modules (not just for Swift, but historically for Objective-C dependencies as well). It's not uncommon for a single Swift source file to contain tens of import statements. Since we have a monorepo, we autogenerate module names by transforming the build system's target label into a module name. So for example,

# some/project/BUILD
swift_library(
  name = "my_library",
)

This target is referred to in Bazel as //some/project:my_library, and we give it the Swift module name some_project_my_library. This has two advantages: (1) users don't have to contrive module names for their many Swift/Obj-C targets, and (2) it avoids collisions between targets in the same dependency graph.

We also have additional tooling that tries to automatically maintain BUILD files, using source files as the source-of-truth. So if someone writes this:

import some_shared_library
import another_shared_library

We can keep the BUILD file automatically in sync:

# some/project/BUILD
swift_library(
  name = "my_library",
  deps = [
    "//another/shared:library",
    "//some/shared:library",
  ],
)

The problem here is that this means the "ugly" derived names are what users have to internalize and write, rather than the target labels that naturally fall out from the project's directory structure. I want to invert this because it would yield a much better developer experience.

Ideas I've Considered

1. Allow macros to expand to import declarations

#bazelImport("//some/project:my_library")
// expands to: import some_project_my_library

In this case, the macro would implement the same module name derivation logic as the build system (or, in a future where a macro could read an input file, we could have the build system provide the mapping that way).

I hacked up the compiler to allow this, but found that my import was never actually resolved. That makes sense; the compiler has to resolve imports first to know which macros are available, and even if I generate an import declaration from a macro, there's no opportunity to resolve it later. I don't know if this is feasible without opening a can of worms—you could have a macro that introduced imports that introduced new macros, which you'd expect to be expanded, but then those could introduce their own imports, ad nauseam.

2. Introduce a new notion of "module origins"

import from "//some/project:my_library"

My idea here was that we could provide a mapping (probably in the -explicit-module-map-file JSON manifest) from opaque "origin identifiers" to module names and allow the import declaration to be written in terms of that origin. In Bazel, we would just use the target label as the identifier.

This would probably be the most work, and has the disadvantage that it introduces new syntax. A possible advantage is that it might be able to serve the needs of the oft-requested package-import feature for Swift scripts. Just brainstorming, a Swift script could contain something like the following:

import from "https://github.com/apple/swift-argument-parser.git"

@main
struct MyCommand: ParsableCommand { ... }

and when the script was run under SPM, it could use a syntax scan to resolve the repositories, generate the mapping to module names, and pass that to the compiler. There are some details that would need to be worked out here though, like how to handle repositories that export multiple targets. (Import all of them? Let the user choose a single one, like import ArgumentParser from "..."?)

3. Allow module names to contain non-identifier characters

import `//some/project:my_library`

SE-0275 would have allowed backtick-delimited identifiers to contain non-identifier characters, which ought to have allowed this. However, that proposal was rejected, and I'm not sure if a subset of it for module names would warrant revisiting it.

I think this would also work for Objective-C modules; to my knowledge Clang lets you write module "//some/project:my_library" { ... } in a modulemap file; we do this to implement layering checks in Bazel's C++ support so that the diagnostics show the target label, and I imagine ClangImporter could be updated to handle those names.

Separately, there's the question of how to deal with serialized module filenames for modules with non-identifier characters. Search-path-based imports require the module name to match the file name, which wouldn't work if we allow characters disallowed in paths. We'd need to invent an encoding, or restrict it to modules imported using the -explicit-module-map-file JSON manifest.

OTOH, a big advantage of this idea is that the module name is exactly what's written in the import. So, if the user needs to write it elsewhere in source (e.g., to fully qualify a name), the name is the same. The other options above would require the user to still be aware of the transformation scheme or for us to provide an affordance to retrieve it.

Other options I haven't thought of

Anything I've missed?

Wrap-up

This is something I'd really like to make progress on in the near future, but none of the options above feels totally satisfying yet. I'd love to hear other folks' thoughts!

3 Likes

maybe i’m overlooking something key here, but i’m having a hard time understanding what is gained by going from

import MongoDB_BSON_BSONInspection

to

import "//MongoDB/BSON:BSONInspection"

or some equivalent but more-decorated syntax. is there something i’m missing here?

1 Like

As mentioned in my original post:

We also have additional tooling that tries to automatically maintain BUILD files, using source files as the source-of-truth.
[...]
The problem here is that this means the "ugly" derived names are what users have to internalize and write, rather than the target labels that naturally fall out from the project's directory structure.

In addition to the overhead on the user to manually mangle the name (not hard but non-zero), multiple bits of tooling have to support the same name mangling scheme or the reverse of it (it's not actually reversible since it's lossy) in order to scan Swift source for imports and update build dependencies based on that.

IMO there is absolutely an advantage—especially at massive scale—to making it possible to use the canonical name that already exists for a target instead of inventing (or requiring the user to be aware of) a new one, and I'd like to explore strategies to make that possible.

2 Likes

since the build system generates the module names from the BUILD manifests, could the tooling generate a mapping of target names to module names and use that to update the manifests?

users would still have to internalize and write the derived module names whenever they need to use qualified references, the only difference is the module name wouldn’t be visible at the top of the file.

One major goal is to make it possible to analyze source code without having to invoke a build, when possible. Many of our tools depend on this to operate efficiently.

I mentioned this in my original post:

OTOH, a big advantage of this idea is that the module name is exactly what's written in the import. So, if the user needs to write it elsewhere in source (e.g., to fully qualify a name), the name is the same. The other options above would require the user to still be aware of the transformation scheme or for us to provide an affordance to retrieve it.

I'm aware of that limitation and in all but option #3, yes, we would need to provide another affordance to retrieve the module name from a different representation. These are precisely the issues and trade-offs I want to discuss and look for solutions that the language can help us solve. (To be clear, nothing I would propose would be required; smaller scale Swift code would have no need to use this or be aware of it, but it would be a large boon to massive scales.)

does bazel have an equivalent of swift package dump-package that can be used to get the build plan without doing a full build?

At the scale we're talking about, even that kind of analysis is costly; it's not efficient enough for other tooling that manipulates build files and performs other interactive operations in an IDE.

I appreciate the discussion of build-system-based options, but I assure you I've already explored options like the ones you're describing in my time working on Swift build infrastructure. I posted this thread specifically in this subforum to focus on language/compiler-based solutions to the problem, because after trying those other options, I believe that's where the most progress can be made, so I'd like for us to stay on that topic.

I’ll note that module names do have to be identifiers because they can appear inline in source: you can explicitly say Foundation.URL to disambiguate if need be. Whatever solution gets chosen here should deal with that in some way, whether it’s letting the user optionally provide an alias for a module imported by non-identifier, or…honestly that’s the best idea I have, the others are all worse in some way.

Additionally, they end up in mangled names, which can be relevant if your code ever does a run-time type lookup by name (mostly only relevant for nib/storyboard code). But that’s only a little worse than SwiftPM’s existing module renaming feature, I think.

Finally, I don’t think you mentioned it, but people have asked for implicit module imports via the command line for a long time. I personally don’t like that feature very much because it hides where names are coming from, but it does already exist in the compiler (mainly for things like playgrounds, not to mention the stdlib). A downside there would be that it’s not per-file: you’d have to be okay with all of the Bazel dependencies being imported into every file, even if that causes a conflict.

2 Likes

Right, that's what makes the backtick option appealing, because fully-qualifying doesn't require any additional special additional affordances:

`//some/project:my_library`.SomeType

would just work. (Ironically, it appears I can't write this as inline code using Discourse's Markdown!)

For the "module origin" option, we could address this with an ability to provide a different name for the imported module within the file that imports it:

import from "//some/project:my_library" as MyLibrary

let x = MyLibrary.SomeType()

Exactly; I don't think this would be a good solution for the reasons you described. It's important that the reader be able to look at a source file in isolation and see what it uses, rather than having that provided transparently (and over-eagerly) by the build system.

(That being said, I did try using -Xfrontend -import-module to inject my special import macro attempt without explicitly importing it.)

Aside: the syntax for inline backticks in code is to use more backticks to quote the code: `` `Example`.Foo ``

Ah-ha, thanks! The problem was that I had to add a space between the opening double backticks and the single backtick I wanted at the beginning of the span.

1 Like

i do like this idea, it could also benefit SPM projects that use target names that don’t match the module name.

import from "BSON+OrderedCollections" as BSON_OrderedCollections
2 Likes