Building .swiftmodules for SourceKit

marcrasi · August 19, 2019, 7:01pm

Background:

I'm working on adding SourceKit to Google's internal source tooling. The architecture is a server that runs one instance of SourceKit per user, and uses those instances of SourceKit to serve requests from IDEs.

To make this work on code that imports modules, I need to give SourceKit access to the appropriate ".swiftmodule" files.

One way to get these ".swiftmodule" files would be to have my server call out to Google's (remote, distributed) build system, which would produce these files. There are some obstacles to doing this:

The build system is a shared resource that is inappropriate to call every time the user makes a local modification to a module's source.
There is a pretty big startup latency for each call.
The Swift compiler on the build system is built and released separately from my SourceKit server, so the binary ".swiftmodule" files will often be incompatible.

Therefore, I'm thinking of making the server itself build necessary ".swiftmodule"s from sources. The server has access to a dependency graph of modules and access to the compiler flags for each module (all this information comes from the build system). The server also has access to all the source files. So I would need to implement some sort of "lite build system" that uses this information to trigger ".swiftmodule" (re)compilations when necessary.

Q1: Does a "lite build system" sound like a reasonable approach?

Are there any drawbacks I should consider? Any better ways to get my server working on code with dependencies?

Q2: Building .swiftmodules in memory

A constraint on my server is that I can't read or write to the local filesystem. So I can't simply spawn "swiftc" processes to build things.

Are there some Swift compiler functions that I can call from my server binary that let me do most of what "swiftc" would do, except that would read inputs from a custom VFS and write inputs to memory?

Inspired by the subcompilation in ParseableInterfaceModuleLoader.cpp, I was able to prototype a working implementation like this:

Construct a CompilerInstance ci and set its filesystem to my custom vfs.
Construct a CompilerInvocation invok and set it up using swift::driver::getSingleFrontendInvocationFromDriverArguments and invok.parseArgs.
ci.setup(invok)
ci.performSema()
Set the module's serialize action to serializeToBuffers. (I had to add a new flag to serializeToBuffers that teaches it not to write to the filesystem at all).
ci.performSILProcessing().

Does that seem like a reasonable robust approach?

Q3: Putting this in master & collaborating

Ideally I'd put the "building .swiftmoudles in memory" and the "lite build system" in master with tests so that Swift CI can keep it working and so that others can use and improve it.

Is anyone working on something similar? Would this be useful for anyone else?

Q4: Can we do something faster than compiling .swiftmodules?

It seems that in principle, SourceKit shouldn't need you to fully compile imported modules. It just needs to see the public interface of imported modules, which could be quickly parsed out of a module much faster than the module can be compiled.

Is this something that could be done eventually? Has anyone contemplated doing this?

blangmuir · August 19, 2019, 11:12pm

Hi Marc,

I've been thinking about this a bit, along with other build-system-integration questions in relation to SourceKit-LSP. There have been some recent conversations about build system integration in the context of SourceKit-LSP, such as SourceKit-LSP for other build systems - #12 by blangmuir.

My preference would be for a design where the "build system" lives outside of sourcekitd, but sourcekitd can provide the compiler support that's needed to generate swiftmodules (and generated headers). Keeping the build system separate makes it easier for us to share context between sourcekitd and another language server like clangd, and lets sourcekitd focus on its core task of interfacing to the swift compiler.

Here's a sketch of what I have in mind:

A higher level service such as SourceKit-LSP acts as the build system for sourcekitd. The implementation of this could be partly delegated to a build-system-specific plugin that can talk to your underlying build tools, and maybe cache results to avoid expensive requests.
The build system service is responsible for (re)generating the inputs necessary for a request in a particular file, including the .swiftmodules
This could be implemented by shelling out to swiftc if you have local file system access
Or it could be implemented by calling back to sourcekitd, with a (hypothetical) request that writes a swiftmodule to memory, which we can use with the VFS layer to make available to other sourcekitd requests.

In a model like this, you could imagine also generating a "-Swift.h" header in-memory with sourcekitd and then feeding the header to clangd so that e.g. objc code could see the changes from upstream swift modules.

Another advantage of this model is that the build system has the flexibility to provide other kinds of build intermediates that sourcekitd has no way to generate itself. For example, if your build process generates swift files from gyb, you could theoretically teach the build system service how to do that in-memory and feed that back to sourcekitd as appropriate.

I haven't thought through the details of how you would do this generation in-memory - for example, whether it is a blob of data returned from the request that is then sent back to sourcekitd, or if it just gets written directly to an in-memory vfs in sourcekitd. I also haven't though through whether sourcekitd needs to be able to request the generation of these intermediate inputs, or if this is something a higher-level service should do.

Thoughts?

[0] Nothing here is specific to sourcekit-lsp itself; any higher level service could play the role of build system to sourcekitd.

akyrtzi · August 19, 2019, 11:35pm

You could avoid this issue if you had your build system provide the .swiftinterface files instead (the other 2 issues may still be hard blockers though).

You still need to typecheck the public interface though. If by "compiling .swiftmodules" we can have the compiler generating the swift module/interface with only the minimal typechecking necessary (e.g. by avoiding typechecking function bodies), then I don't see how we can be more performant than that.

marcrasi · August 20, 2019, 7:06pm

Hi Ben,

Your sketch sounds very similar to what I have in mind, with the "higher level system" in my case being the Google-internal server (C++) that controls the sourcekitds.

Since our two higher level systems are different, and written in different languages, we may not be able to share any implementation between them. We can at least discuss the design though. Thanks for your thoughts about it!

I haven't thought through the details of how you would do this generation in-memory - for example, whether it is a blob of data returned from the request that is then sent back to sourcekitd, or if it just gets written directly to an in-memory vfs in sourcekitd.

For my purposes, I wouldn't even need a sourcekitd request that compiles swiftmodules. I'm linking the Swift compiler into my server, so it can just be a C++ function that compiles swiftmodules in memory. I'll propose a PR with this C++ function soon. Then if someone has a use case for a sourcekitd request that compiles in memory, we can add a sourcekitd request that calls the function.

I also haven't though through whether sourcekitd needs to be able to request the generation of these intermediate inputs, or if this is something a higher-level service should do.

My initial thought is that the higher-level service should be completely responsible for making sure that the generated inputs are there for sourcekitd, because it seems simpler and I can't think of any advantages of the other approach that would outweigh the complexity.

If the higher-level service is responsible, then when it receives a request, it can just ask the build system to make sure that the inputs for the relevant target are fresh enough, and then forward the request to sourcekitd.
If sourcekitd were requesting inputs, it would have to figure out what the necessary inputs are (seems not completely trivial), and send a request to the higher-level system (would require a new interface).

Hi Argyrios,

You could avoid this issue if you had your build system provide the .swiftinterface files instead (the other 2 issues may still be hard blockers though).

I did play around with this a bit, but the other 2 issues did seem insurmountable.

Interestingly, the ParseableInterfaceModuleLoader that loads .swiftinterface files kinda has a "lite build system" in it that tracks dependencies and (re)compiles .swiftinterface files into .swiftmodule files when necessary. It's quite similar to what I'm thinking of doing.

You still need to typecheck the public interface though. If by "compiling .swiftmodules" we can have the compiler generating the swift module/interface with only the minimal typechecking necessary (e.g. by avoiding typechecking function bodies), then I don't see how we can be more performant than that.

Yeah, I can't think of anything that would speed things up more than avoiding typechecking function bodies. Intuitively, it seems like avoiding typechecking function bodies would make .swiftmodule compilation a lot faster. I haven't actually experimented with it. I think I'll play around with this some time.

allevato · September 6, 2019, 11:00pm

@harlanhaskins, this seems related to the work you're doing in https://github.com/apple/swift/pull/20420. Would it make sense to also have an option to skip type checking of all function bodies, even inlineable ones, for use cases where the eventual consumer of the swiftmodule is known to be something like SourceKit?

harlanhaskins · September 7, 2019, 12:05am

Ideally, with the proper amount of laziness, this will fall out of the Request Evaluator architecture. @Douglas_Gregor recently request-ified delayed function body parsing, it wouldn't be that wild to teach the compiler not to walk into bodies until they're really needed. At which point SourceKit could choose not to need them, then the bodies wouldn't even be parsed.

marcrasi · September 19, 2019, 5:52pm

In addition to the proper amount of laziness, we'd need some way to communicate to the compiler that we're doing a compile for SourceKit. Right now, some external build system builds the .swiftmodules and then SourceKit reads them, so there is no way for the compiler to know that it doesn't need to put all the inlinealbe function SIL in the .swiftmodule. Maybe one of these ideas would be good:

A compiler option for building .swiftmodules without inlineable function SIL. The external build system could use this option when it knows that it's building the .swiftmodules only for SourceKit.
A "give me the public interface for this module" request that SourceKit can make when it needs the public interface.

On a different note, I just discovered "Update on implementation-only imports" which sounds like another exciting opportunity for reducing the amount of time it takes to build things for SourceKit. A compilation that only builds the public interface of imported modules shouldn't need any information about implementation only imports into those modules. If implementation only imports become the default (or if they're strongly encouraged), this could greatly cut down the amount of things we need to build.

blangmuir · September 19, 2019, 6:40pm

@jrose does the above suggestion work? I'm worried about code like the following:

@_implementationOnly import InternalModule
public class Foo {
  public var field = someFunctionFromInternalModule() // returns Int
}

The API doesn't use any types from the impl-only import, but it does depend on that module for determining the type. The .swiftinterface doesn't need to care, because it would add a type annotation.

jrose · September 19, 2019, 11:28pm

Ben is correct, being an implementation-only import does not, under Swift's current rules, mean that you don't need to resolve the import in order to generate the interface. The inferred type case is the only 100% sticking point I can think of, though—everything else is "just" places where the compiler would have to be smart enough not to touch something because it doesn't know the type. (This also only works in library evolution mode, because in non-library-evolution mode the interface needs to contain enough information to do type layout, which isn't implemented yet.)

marcrasi · September 20, 2019, 11:22pm

That makes sense, thanks for pointing it out.

Since implementation only imports are not an official language feature yet, would it be possible to modify the rules of implementation only imports such that they cannot affect the public interface of the importing module? If indeed inferred types are the only sticking point, it seems like a pretty reasonable restriction to not allow inferred types from implementation only imports in the public interface. This would be very useful for running code intelligence on projects with large sets of transitive dependencies.

If that sounds like a sensible thing to do, I could try experimenting with a PR that adds the constraint and then tries to generate interfaces without resolving implementation only imports. It will be interesting to find out if the "places where the compiler would have to be smart enough not to touch something because it doesn't know the type" can be handled with a few simple changes or if they're a very big implementation hurdle...

blangmuir · September 20, 2019, 11:45pm

To retrofit this onto implementation-only, you need to track whether a type is affected by an impl-only import for all internal/private declarations as well to handle the transitive case[0]. To make the diagnostic clear, you probably also want to be able to track the chain of declarations involved so you can note how it happened. It would have been much cleaner if we required type annotations on all public API. Not sure if that's palatable at this point through.

[0] e.g.

@_implementationOnly import InternalModule
public class Foo {
  private var fieldInternal = someFunctionFromInternalModule() // returns Int
  public var field = fieldInternal // error: type affected by internal-only import
}