Adding explicit module support to sourcekitd

Hi,

I’m interested in contributing support for explicit modules for sourcekitd / Sourcekit-LSP. Currently, sourcekitd uses implicit modules (pcms built when sourcekitd loads an AST) with detailed pre-processing records enabled, which currently in ClangImporter forces the raw module format and the detailed preprocessing record flag.

Ideally, I’d like sourcekitd to be able to re-use explicit modules from compilation, including obj format, but I’m not sure if there’s an issue without the detailed pre-processing record flag being enabled - should we enable it for our compilations too? Otherwise, if sourcekitd needs this information, I think we’d be forced to have separate explicit module compilations for sourcekitd.

I’m happy to contribute this but could definitely use some pointers. Thanks!

3 Likes

Sorry for the late response, this would be wonderful to get done! Happy to help answer any questions.

AFAIK the detailed preprocessing record is enabled in other to allow us to share the same PCMs as indexing, as we enable it there to allow indexing macros. Although I should note that it looks like we are still able to index macros without it, but we'll only pick up the last definition.

I should note though that even if we changed that, there are other differences between the index and regular build that likely means we'll still end up with a separate set of built module dependencies. For example, we enable -experimental-allow-module-with-compiler-errors, -fallow-pcm-with-compiler-errors, and -fallow-pch-with-compiler-errors to allow modules with errors.

In terms of the work needed to adopt explicit modules, my recollection is that it comes down to the following:

  1. Ensuring that preparation works correctly with explicit modules

I don't have a good sense of what all the issues are here, though I believe we have previously encountered issues with the fact that we allow compiler errors here. For example, a PCH built with a missing header wouldn't get rebuilt when the header becomes available since it isn't recorded as an input.

  1. Propagating the explicit modules to SourceKit

SourceKit expects driver arguments (and is currently using the old driver [1]), which is a little problematic for explicit modules since we essentially need to split the driver invocation into the dependency scan and explicit module build (which should run during preparation), and the actual frontend invocation for SourceKit (which takes the set of explicit modules from that). It's possible we could hack this together by manually pulling out the -explicit-swift-module-map-file obtained from preparation and pass it as an -Xfrontend argument, but it might also be worth experimenting to see if SourceKit could take frontend arguments directly instead. The latter might not be entirely trivial to do since a lot of the infrastructure around SourceKit currently expects a single set of build arguments per target.

[1]: @ArtemC started work on adopting the new driver for SourceKit in Link `swift-ide-test` against the new (early)swift-driver and exercise its `getSingleFrontendInvocationFromDriverArguments` C API by artemcm · Pull Request #75840 · swiftlang/swift · GitHub.

1 Like

FWIW, in our use case (Bazel builds), we don't need the dependency scan step at all—we express all of the system/SDK modules as explicit dependencies and so we don't do any of the import-scanning work that produces the the dependency scan JSON.

Totally understand that that likely would be necessary in some fashion for SPM/Xcode use cases, but I want to make sure that we don't end up with an implementation that requires or assumes it either.

In Bazel builds we would love to adopt -explicit-swift-module-map-file to express our dependencies for regular compilation actions instead of passing the hundreds of -Xcc -fmodule-(map-)file arguments on the command line that we do today, but IIRC the fact that SourceKit is still using the old driver is blocking that. (Blocking in the sense that we can't just take the command line from the Bazel action and drop it into SourceKit; we'd have to manually convert the JSON manifest to the equivalent Clang flags.)

So if that issue were resolved, then that sounds like a great path forward for both of us: Xcode/SPM could do the prescan work to get that file, and for Bazel we would just generate it ourselves based on the BUILD graph and pass it directly to SourceKit as well.

1 Like

I don't actually know if this is true, certainly the old driver won't understand it if you pass it as a driver flag, but I think if you go behind its back and use -Xfrontend it might work, I haven't played about with this though. @ArtemC would know better. But in any case we really need to move SourceKit onto the new driver, assuming we want to continue passing driver arguments to it that is.

Yes that's exactly what I'm thinking, specifically preparation (in BSP this is buildTarget/prepare) is what should do the dependency scanning and explicit module building, and you can skip that for Bazel.

1 Like

I do think this ought to work.

And I broadly second the enthusiasm for this idea and the approach @hamishknight outlined above with 1) teaching the preparation stage to do the dependency scan and building the actual module dependencies, and 2) teaching SourceKit to actually execute compilation in a way that uses preparation results without repeating them, likely with some kind of specialised/massaged swiftc invocation.

I am not too familiar with what it would take to have SourceKit take frontend arguments directly though, and where they would be generated in that case.

1 Like

So for a swift-build BSP I think we should actually already have the frontend arguments in the build description (and I believe those would already have all the right flags to be able to use the explicit modules from preparation), so that would mostly be a case of refactoring the logic in LSP to not assume that a target has a single set of build arguments I think (or maybe it would be sufficient to teach it to stick -primary-file in the right place, perhaps SourceKit itself could even do that).

In cases where the BSP returns driver arguments (e.g for compile_commands.json, and I assume Bazel?), I'm thinking we could potentially have LSP link swift-driver and do the transform there. Ideally though it would be part of the BSP itself I think. I haven't fully thought through all of the implications of this though, it's possible we'll need to continue supporting driver arguments in SourceKit regardless (but we could still have a mode for passing direct frontend args). Really my hope here is to just avoid trying to reconstruct the frontend args in SourceKit that swift-build already has.

The way our Bazel integration works (@DavidGoldman can chime in with more details, he's deeper in it than I am) is that we scrape the actions that Bazel creates to spawn swiftc, so we're dealing with driver flags. We don't directly have any way to do fancy things like link to swift-driver and extract the frontend jobs from that—the only interface we have to the compiler is the driver executable via standard swiftc invocations.

We actually wrap the driver invocation in another process that does some command-line fudging for things that we can't do at the time that we create the action, so theoretically we could have that do something like dump an extra output file that's the moral equivalent of compile_commands.json with the individual frontend invocations by running swiftc -###, but I believe swift-driver today is over-aggressively resolving some relative paths to absolute paths before creating frontend jobs, so writing those paths to a file would produce nondeterministic outputs, harming caching unless we went to the trouble of sanitizing those paths first. (I need to track these down better and try to fix them upstream.)

It would be much easier for us if SourceKit could just take the driver flags expected by swift-driver and process them however it needs.

I see, interesting! Out of curiosity, how are you actually getting those settings into LSP, or are you using something completely different there that directly uses sourcekitd.framework? I'm wondering if it would be sufficient to convert the driver arguments in LSP itself.

Currently in our implementation we have a custom built-in build server to sourcekit-lsp as well as working on a new implementation that just uses a CompilationDatabase but we could also build a simple BSP server. We could definitely try that -Xfrontend trick, maybe that will work.

I should note though that even if we changed that, there are other differences between the index and regular build that likely means we'll still end up with a separate set of built module dependencies. For example, we enable -experimental-allow-module-with-compiler-errors, -fallow-pcm-with-compiler-errors, and -fallow-pch-with-compiler-errors to allow modules with errors.

Hmm yeah this part is more tricky, I believe currently in our custom indexing implementation we don’t use these flags and thus our integration will fail once you have a build error. I think we could trigger an IDE build itself with these flags and then sourcekit wouldn’t need to add them. Having this would mean it no longer shares with a primary build but I do still think this would be beneficial to support for other reasons.

Would we need multiple set of arguments support for sourcekitd/sourcekit-lsp? I would assume sourcekitd/sourcekit-lsp would always want to use the resilient form with -experimental-allow-module-with-compiler-errors, -fallow-pcm-with-compiler-errors, and -fallow-pch-with-compiler-errors if it can, otherwise use the standard flags? Or are there other indexing flags that will break things?

-fallow-pcm-with-compiler-errors and -fallow-pch-with-compiler-errors are Clang frontend flags, not something that's serialized into the actual AST, from what I can tell. So even if SourceKit's invocation uses them, that shouldn't prevent us from using PCMs built with Bazel that didn't use those flags.