Proposal: emitting source information file during compilation

Xi_Ge · September 12, 2019, 11:03pm

Swift module files do not store the source locations (file:line:column) of their declarations. While this is by design, the lack of source location information has certain downsides:

A tool that runs during a build and processes Swift module files (like the ABI checker) cannot emit diagnostics that can point to the source location where a declaration is written
The compiler itself cannot emit diagnostics pointing to user code if the declaration was imported from another module

Proposal:

To enhance the user experience of diagnostics on locally built modules, we propose emitting an additional file during compilation to keep track of the source locations of decls (.sourceinfo file). Mapping from decl USRs to source locations in the local file system, the file is always emitted as a side product of Swift modules into a private sub-directory (swiftmodule/private). This ‘private’ sub-directory will be present in a local build but excluded for packaging a Swift module for client consumption. A source location is encoded as an absolute path to the source file and the byte offset inside the source file.

The handling of .sourceinfo files is entirely an implementation detail of the module importer. The module importer lazily loads and decodes the .sourceinfo file when source location info is requested from an imported decl. The module importer should also be responsible for converting the encoded source location to an instance of SourceLoc and opening the buffer to where the SourceLoc points to. From a client's perspective, the only difference after this proposed change is that the imported decls would now return valid SourceLoc instances when calling the getLoc() API on them.

The .sourceinfo file piggybacks on .swiftdoc file in various ways. Firstly, its format is similar to that of .swiftdoc files in a binary format indexed by decl USRs. Secondly, .sourceinfo files will be generated and merged by the Swift frontend like .swiftdoc files. The reason we don't extend the existing .swiftdoc files to also keep track of source information is that .swiftdoc files are shipped to the library clients, however the .sourceinfo file is used only for local development by the compiler and other development tools. On the other hand, adding source location information to .swiftmodule files isn’t ideal either because source location changes alone shouldn’t trigger the rebuilding of downstream dependencies like public symbols would do.

Alternatives considered:

With @Adrian_Prantl's help, we've experimented with using debug info to get source location information of decls. The shortcoming of this approach is that debug info is highly flow-centered, thus it may not contain all source locations for every decls appearing in the Swift module, e.g. some protocol declarations. In contrast, we'd like to have a full picture of source entities in users' source code for diagnostic purposes.

We've also considered using raw index data to get the source location of decls. However the nature of these data is that they are per-file outputs for all files across the workspace and there's no build step that merges them for a specific Swift module. It would be prohibitively expensive for a tool or the compiler to go through all the raw index data to find a specific decl during a build.

What do you think about this proposed change? Any feedbacks and comments are highly appreciated.
This post was drafted after several iterations with @akyrtzi and @jrose.
CC: @Douglas_Gregor @Slava_Pestov @ravikandhadai

allevato · September 12, 2019, 11:19pm

This sounds really helpful!

Could you go into a little bit more detail about this part, such as what the file layout will actually look like?

For build systems like Bazel, we'd want to be able to capture these .sourceinfo files as outputs of remote compile actions and propagate them around as inputs to upstream actions on other build machines in order to get the same useful diagnostics, so this requires that we be able to know ahead of time (without running the compiler) the expected path/name of the generated .sourceinfo file, and to ensure that we put it in the right place where the compiler will find it later when it does the import.

Xi_Ge · September 13, 2019, 12:10am

Sure. When swiftmodule is a directory, we plan to put the .sourceinfo file into a sub-directory called Private inside the swiftmodule dir, something like:

Foo.swiftmodule
            \
            x86_64-apple-macos.swiftmodule
            x86_64-apple-macos.swiftinterface
            x86_64-apple-macos.swiftdoc
            Private
                    \
                   x86_64-apple-macos.swiftsourceinfo

When swiftmodule is a file, we plan to put the .sourceinfo file adjacent to it, something like:

Foo.swiftmodule
Foo.swiftdoc
Foo.swiftsourceinfo

Xi_Ge · September 13, 2019, 12:13am

After discussing more with @akyrtzi, we plan to name this file .swiftsourceinfo to be more specific.

David_Sweeris · September 13, 2019, 12:18am

+1, seems like something that should be trackable when needed, and easily skipped (by not generating the file) when not.

Adrian_Prantl · September 13, 2019, 10:18pm

It seems odd to me to have two formats to encode the same information. Have you considered instead changing debug info generation to eagerly emit the source location info for the declarations that are missing?

Xi_Ge · September 13, 2019, 10:58pm

Yeah, we did consider that. Operating on the IRGen level, debug-info seems to be an overkill for the goal in general (getting source locations for diagnostics purposes). On the other hand, keeping the .swiftsourceinfo file as a standalone thing allows us to easily extend it in the future to contain more compiler/source tooling specific data that could be serialized during compilation.