SourceKit-LSP for other build systems

richardh · July 18, 2019, 9:21pm

Hi there. We are looking at getting SourceKit-LSP integrated into our mobile repo and need a way to pass in the flags to the LSP using our existing build system. Using compilation databases is a possibility here, but it is a fair amount of work keeping them in sync for file creation / move and hard to make it scale to large repos as they all have to be generated up front without knowing which subtree to start at.

The solution I am currently sketching out is to provide a delegate build system class to the LSP that will call out to any build system to provide flags for a given file. The workspace would be configured with a sourcekit_config.json file at the root that looks something like:

{
    "indexStorePath": "path/to/index-store",
    "flagDelegate": "flag_script.sh"
}

When compiler flags are needed the LSP will call the provided script with the path to the file as its only argument, and the script returns the flags for that file, one per line. This makes it pretty trivial to implement support for LSP for any build system, eg Bazel / Buck / CMake. It also makes background indexing simpler as the index commands can be used to know where the user is currently working and which indexes need to be generated in the background.

Let me know your feedback please, if people are happy with this approach I can put up a PR shortly.

Thanks,
Richard

blangmuir · July 18, 2019, 10:49pm

Thanks for starting this discussion! I'm very interested in improving our ability to work with other build systems. Some design comments follow.

For some use cases, it is important to be able to monitor for changes to the build settings and update them as needed. For example, when you add a new file to a swiftpm target directory, or modify the Package.swift file, we should be able to (a) get the new settings immediately and (b) notify any consumers of build settings (e.g. an existing open document) that the settings have changed. We haven't implemented update notifications for any build system in sourcekit-lsp yet, but it's something I always imagined adding.

This sort of watch-and-notify model seems like it might work better with a build setting server rather than a one-off query. I suppose another approach would be for the command-line tool to be able to provide a list of files to watch and when any of them change to re-query. But that maybe more expensive, and could be limiting if the set of files to watch can change a lot over time. We don't necessarily need to build a full solution for keeping the settings up-to-date right away, but I think it's important to understand how we might get there, or at least how any build settings solution we build fits into this.

I like that this gives us a way to provide the "plugin" out of band such that it would work without needing editor-specific support. I think we could also support alternative ways to configure it, such as the initialization options in the protocol, or using command-line arguments if anyone were interested.

Can you expand on this? Are you talking about driving index data generation based on the queries being made? How would that work in practice - for example, you probably want to have more index data than just the files you've looked at.

DavidGoldman · July 18, 2019, 11:25pm

blangmuir:

For some use cases, it is important to be able to monitor for changes to the build settings and update them as needed. For example, when you add a new file to a swiftpm target directory, or modify the Package.swift file, we should be able to (a) get the new settings immediately and (b) notify any consumers of build settings (e.g. an existing open document) that the settings have changed. We haven't implemented update notifications for any build system in sourcekit-lsp yet, but it's something I always imagined adding.

This sort of watch-and-notify model seems like it might work better with a build setting server rather than a one-off query. I suppose another approach would be for the command-line tool to be able to provide a list of files to watch and when any of them change to re-query. But that maybe more expensive, and could be limiting if the set of files to watch can change a lot over time. We don't necessarily need to build a full solution for keeping the settings up-to-date right away, but I think it's important to understand how we might get there, or at least how any build settings solution we build fits into this.

This is similar to what I had in mind for Bazel - you could watch for modifications to BUILD files - then you would need to rerun Bazel in order to understand the changes that were made. Due to the nature of this, you wouldn't want many of these scripts firing off at once (each creating a Bazel command) - you'd really want some sort of server to manage it, whether it be in sourcekit-lsp itself or a separate process.

There's also the problem of generated file dependencies. More so for Bazel/Buck than other build systems, but it's entirely possible that the file that the user has open depends on some generated files. Once again, you might need some deep build system integration to handle this.

Also not sure what you mean by this; I would think that you really want a single process to understand all of the files that a user has open (effectively: what is the user trying to do? what needs to be built?) and index as needed. sourcekit-lsp currently has this information although currently it's not exposed to the build system (perhaps it should be in some fashion).

richardh · July 19, 2019, 12:32am

Also fine, although potentially a little more fiddly depending on editor. It made sense to me to use a config file as the other integrations are already going this way, but I'm fine with either.

I agree that you will want some sort of server process to handle these requests efficiently, but I was thinking of leaving that as an implementation detail for the specific build system (for instance Buck already has a daemon mode and watches files, so some of this work is handled for us, not sure if Bazel has something similar).

This would definitely be nice to have, how would you envision sending the notifications from an external build system? Adding an interface to the server instance to call or using Darwin notifications or something?

Basically I was suggesting this as a lazy persons insight into what the developer is working on without directly exposing the editor actions through the lsp. The idea would be that the script is called to gather flags for a file, but it could also evaluate whether or not the dependent swiftmodules and header maps need to be generated and so on and kick off builds in the background to make sure the index store for that subtree is populated. It would be more elegant to handle this more explicitly, but I think this handles the primary use case of making files buildable already.

blangmuir · July 19, 2019, 5:40pm

SGTM.

Would the Buck/Bazel build servers themselves be responsible for notifying us? For other build systems such as cmake-server we would need to figure out what the lifetime is, since they don't have a long-lived daemon the way Buck/Bazel do.

If we made the server aspect explicit, I would just expect this is part of the API. If we use an implicit model like you are proposing, something like DistributedNotificationCenter might work (I'm not an expert here) on macOS, but we'd need an answer for Linux (dbus?) and Windows. Or we could have the service tell sourcekit-lsp a path to watch for file changes.

Got it.

richardh · July 19, 2019, 6:26pm

Yes, I think thats the most sensible. If the build system did not notify the language server, at what point would the AST get updated?

Seems like adding something to the API is simplest then.

blangmuir · July 19, 2019, 6:27pm

I talked to @akyrtzi a bit about this, and he reminded me of a couple other pieces of information that we will want from the build system:

List of all known files, which we can use to kick of background indexing as needed
List of all output files, which is important for controlling visibility of index data; e.g. if you remove a source file, we need to know that its unit file in the index is now dead/hidden.

And we'll want to keep those up to date for changes as well.

It's not necessary that every build system we integrate with provide all of these features, but I bring it up so that we can design this in a way that extends to a fully featured build system interaction.

I think my biggest concern about using the simple command-line invocation is that it makes it harder for us to evolve the API and harder to build a fully featured solution in the future. Whereas an explicit server, while it is a more work upfront, makes it easier to evolve.

blangmuir · July 19, 2019, 6:30pm

The AST might get rebuilt because you changed the source file in the editor, but without being told by the build system about the changed settings, we would continue to use whatever command-line arguments we had already cached from the last time. So we wouldn't see the updated build settings until you re-opened the project, or perhaps if you closed the document and we happened to throw out our cached settings.

Note: there might be other interesting notifications from a build system. For example, if you kick off building .swiftmodules for dependencies in the background, knowing when they finish building would allow us to rebuild the AST to get updated diagnostics.

akyrtzi · July 19, 2019, 6:40pm

I'd recommend to take a look at this Build Server Protocol to see if it's something that is close to what we need.

richardh · July 19, 2019, 6:47pm

I would rather push the background indexing responsibility to the build system, wouldn't it be better placed to know what needs to be built and what doesn't?

The output files part is interesting, I had not considered the necessity to clean up the index of stale data. If we didn't what would be the outcome, the language server would get data for unit files with no corresponding source file? Couldn't we ignore the results and delete the stale data in that case?

This does ring true, maybe its worth listing all the interactions the language server would want from the build system and design something minimal around that instead? My only concern would be cost of entry for simpler build systems, but I guess theres not that many that would be using this facility.

Thanks, i'll take a look at this.

blangmuir · July 19, 2019, 6:53pm

We could theoretically detect a missing file during lookup, but it's harder to do that efficiently. Also, the source file being removed was just one example. Another is that a source file maps to a different output file path than it did before (e.g. because of a build configuration change), so we might end up with stale data and have no way to detect that.

blangmuir · August 27, 2019, 1:12am

Hey!

Thanks a lot @richardh (and @DavidGoldman) for continuing to push on the build system design! I wanted to respond to the design you proposed (PR and spec) holistically, as well as talk about some other build system integration points outside the scope of what you proposed. We don't need to implement all of this right away

BSP

Certain parts of your proposed design feel like they fit closely into the Build Server Protocol (BSP) design. Specifically, the discovery and initialization of the server, as well as the list of sources are very close, and I think a small extension to BSP would allow us to handle the output paths as well.

I think it would be premature for us to adopt the BSP wholesale, since it's unclear if we need most of the features. However, I think for these particular API, we should align with the BSP to take advantage of the experience of that community. It also leaves the door open for implementing the whole spec in the future. Concretely, I propose we adopt the following BSP API for our own build system integration:

Configuration file format, at least for the common fields. We could use a different file name to make it clear we don't currently implement all of BSP.
build/initialize
- And related shutdown, exit, etc. logic
workspace/buildTargets
buildTarget/didChange
buildTarget/sources
- These three requests together let us get the source files, as well as incrementally update them. It also provides target-level information that will be valuable to us in Xcode, and may give opportunities to subset only relevant data for situations where we think it is too expensive to communicate about all build targets all of the time.

As a small extension, we could add a request that piggy-backs on the target (and would be updated using buildTarget/didChange notifications).

buildTarget/outputPaths

Compiler Arguments

While the existing BSP extension for Scala uses a target-level query for compiler arguments, I prefer to use a file-level API similar to your proposal. That gives the build system the ability to minimize work for us, and also is a nicer API to work with for the language server.

With respect to the file settings API, I see that you added scheme/destination information to the queries. Argyrios and I discussed this today, and we’d like to propose that the build system be responsible for deciding how to compute file settings based on its own knowledge of the configuration (including scheme and destination) rather than exposing it in every request. We might need to expose scheme/destiantion information to the indexer for other reasons, but if we can keep it out of the file settings API, I think that would be an improvement. What do you think?

Other than removing the scheme/destination from the API, I propose the following names

textDocument/sourceKitOptions
- One-off request for settings for a given URL.
- The name is based on buildTarget/scalacOptions, but at the file level.
textDocument/registerForSourceKitOptions (and unregister)
- Registers a document to watch for settings changes.
textDocument/sourceKitOptionsChanged
- Notification of new settings

When Dependencies Change

One of the things we want the build system to be able to provide to the LSP server when possible, is the ability to efficiently generate the dependencies when they are not already built, or when they need updating (note: this may not be feasible for all build systems). To that end, I think we want to provide hooks for the language server to tell the build server about changes to files that are part of the project, as well as let the build server tell the language server when build intermediates like modules have been regenerated and we may want to refresh the document.

I propose the following API:

textDocument/didSave
- Parameters: URL
- Notify the build server that a source file from the workspace has changed, and allow it to compute what impact it has on the build graph of registered files.
workspace/dependenciesOutOfDate
- Parameters: Array
- Notification from the build system that the dependencies of the given URLs are out of date, for example because of source file changes to modules that are imported.
textDocument/updateDependencies
- Parameters: URL
- Notification from the language server to the build server to update the dependencies of the given document. By making this an explicit request we let the language server decide whether it is necessary, as you may have several documents registered for file settings changes, but not want to actively build all of their dependencies.
workspace/dependenciesUpdated
- Parameters: Array
- Notification from the build server that the dependencies of the given URLs have changed and that we may want to refresh them. This is a notification rather than a reply to the previous message because it allows the build server to tell us about dependency updates that came from a build rather than through a request from the language server.

Background Indexing

Another point of integration with the build system would be if we can offload some of the work of background indexing. I am not proposing a specific API here, but I wanted to write down some thoughts on this related topic. Argyrios already mentioned that we would like to add indexing based on the live AST, which should give us an efficient way to update the index for files the user is actively editing. However, there is still a niche for doing background indexing for files that are not built (so no data from indexing-while-building), and not actively edited. A common example is unit tests, which may not be built by default, or files that are part of inactive schemes in Xcode.

We have some experience doing background indexing where the language server is responsible for the whole process, but this is suboptimal in a number of ways. For one, we are doing the indexing one file at a time, but the build system for swift can use batch mode. For another, the build system already has a high quality dependency graph and scheduler that we don’t want to reimplement.

We think that an API where the language server requests indexing based on a list of sources may give us a good balance, as it allow the language server to avoid indexing files it already has data from or which it is uninterested in, while giving the build system a bulk operation that it can efficiently schedule and execute. In normal operation, we would expect incremental index updates to be done using the live AST, combined with data from indexing-while-building for the majority of code. Background indexing would kick in for files outside the build when opening a new project, or after e.g. a git-pull.

richardh · August 27, 2019, 6:01pm

Thanks for the details response @blangmuir. This all sounds pretty reasonable, I don't see any issues with the BSP setup endpoints and the document flags.

I'm fine with it not being a parameter to the flag settings interface, but we do still need a way for the editor to communicate scheme and destination information to the build system through the language server. There doesn't seem anything to borrow from the BSP for this, so I would propose adding a build/destinationUpdated notification which would include the platform, arch and destination UDID, and a build/schemeUpdated for the scheme identifier and targets.

I think that will work fine, yes. How would this be determined, based on the output paths of the available targets and looking to see what is missing?

blangmuir · August 27, 2019, 8:40pm

I would propose adding a build/destinationUpdated notification which would include the platform, arch and destination UDID, and a build/schemeUpdated for the scheme identifier and targets.

Works for me as long as it's optional for build systems that either don't have this notion, or where it is provided out-of-band.

How would this be determined, based on the output paths of the available targets and looking to see what is missing?

The index knows what source files are covered by the data.

DavidGoldman · August 28, 2019, 3:26pm

Thanks for the writeup! At a high-level that sounds good to me, with just a few minor nits:

This sounds good, but it's a little unclear what part of the Target information we will make use of / require. We can always start minimally and expand it in the future though. (I guess this is true of a lot of the BSP as a whole)

It sounds like the registration is also for the dependency changes below, correct? In which case you might want to rename this?

This seems reasonable to start, although from the above (assuming change notifications are limited in scope to open files), how will the LSP know what to index? Will it just try to index everything and assume the build system is smart enough to cache things?

blangmuir · August 28, 2019, 5:22pm

For now, I'm mostly thinking of using the target identifier itself in the other target-based APIs. But I think we will expand that over time. The tags can be used to mark test targets, which is something we can use as part of test discovery[0]. The languageIds can be used to skip unsupported targets. We can also add our own language-specific data here if we decide to in the future

[0] LSP itself doesn't say anything about tests, but in Xcode we provide structured information about unit test classes and method to allow you to run specific tests from the test navigator.

Ah, that's a good point. Do you have a suggestion? Is textDocument/register too generic?

I see two parts to this: how to choose the subset of targets to include, and how to choose the subset of files from those targets. I don't know that there is a single good answer to choosing the subset of targets to consider, and it probably depends on the kind of user model presented by a particular build system and/or editor. I think the right default absent any other information is to include all targets. If your build graph is really large, you probably need to figure out the right subset.

For choosing files within those target that need to be (re)indexed, the index itself can tell which files are covered by existing index data and whether it is up-to-date.

DavidGoldman · September 10, 2019, 8:59pm

Hmm, probably. Sure we can think of something though

How does it go about doing this (e.g. tell if it is up-to-date)?

blangmuir · September 11, 2019, 4:43pm

Here is a method that can mark unit data covering a given file out of date: https://github.com/apple/indexstore-db/blob/90cafe9dbbdbbfc840a76cedfe1cc730c8f94e27/lib/Index/IndexDatastore.cpp#L635

blangmuir · October 16, 2019, 7:12pm

Update

After a deeper investigation of build system integration, @akyrtzi raised some concerns about parts of the proposed build server protocol. Based on that, we have some proposed modifications.

A) The build system does not always have enough dependency information to determine for a given file change precisely which ASTs will be affected in the language server. In particular, we can make updating dependencies of the live AST more efficient if we build swiftmodules without typechecking function bodies, but then the dependencies will only contain a subset of the full AST dependencies.

We can solve this without a BSP change by noting that workspace/dependenciesUpdated can be conservative. For example, after a build we can say "all registered document" may be affected.
To avoid eagerly rebuilding lots of ASTs, we should (1) reduce the set of registered documents (see below), and (2) introduce a toolchain language server API to efficiently update an AST only if its dependencies actually changed (or if there was an import failure that might go away now that we have new files).

B) The proposal goes to some length to enable the language server to register for notifications but then avoid rebuilding all dependencies - for example, the handshake around didSave -> dependenciesOutOfDate -> updateDependencies. This is inefficient, not only in terms of back-and-forth between the language server and build server, but also for things like detecting changes to settings for files we aren't actively working with.

This can be simplified if we make it the responsibility of the language server to unregister for notifications eagerly.
We can remove didSave and dependenciesOutOfDate, and instead rely on the language server requesting updateDependencies when a file is saved, which is also easier to understand for cases like git pull - we want a single updateDependencies at the end, not to send a bunch of individual notifications.
The language server can use the heuristics to decide when to unregister, e.g. for files that haven't had a request in a while. We will need to remember to re-register if there is a new request.

Proposed API, Updated

Adopted from BSP

We adopt the following basics from the Build Server Protocol (BSP) design.

Configuration file format, at least for the common fields. We could use a different file name to make it clear we don't currently implement all of BSP.
build/initialize
- And related shutdown, exit, etc. logic
workspace/buildTargets
buildTarget/didChange
buildTarget/sources

Output Paths

As a small extension, we could add a request that piggy-backs on the target (and would be updated using buildTarget/didChange notifications).

buildTarget/outputPaths

Compiler Arguments

We add the following extensions to support tracking compile arguments for Swift/Clang files.

textDocument/sourceKitOptions
- One-off request for settings for a given URL.
- The name is based on buildTarget/scalacOptions, but at the file level.
textDocument/registerForChanges
- Registers or unregisters a document to watch for settings changes and for building dependencies (see "when dependencies change")
build/sourceKitOptionsChanged
- Notification of new settings

When Dependencies Change

One of the things we want the build system to be able to provide to the LSP server when possible, is the ability to efficiently generate the dependencies when they are not already built, or when they need updating (note: this may not be feasible for all build systems). To that end, I think we want to provide hooks for the language server to tell the build server about changes to files that are part of the project, as well as let the build server tell the language server when build intermediates like modules have been regenerated and we may want to refresh the document.

workspace/updateDependencies
- Parameters: [URL]?
- Notification from the language server to the build server to update the dependencies of the given documents (if nil, update all of them).
workspace/dependenciesUpdated
- Parameters: [URL]?
- Notification from the build server that the dependencies of the given URLs have changed and that we may need to refresh their ASTs. *If the array is nil, assume all of them are affected.

jvican · November 6, 2019, 9:32pm

Hey everyone! Co-author of BSP here

It's exciting to see SourceKit-LSP building on some of the foundations of BSP, even if you don't fully implement the protocol. Please let me know if you have any questions or you'd like some clarifications about its current design. There are some non-obvious concepts or endpoints that might need some context.

Out of curiosity, what is this endpoint needed for?

I believe you could encode this information as a JSON object in the data field of the build target. In the Scala extension, we have an object called ScalaBuildTarget which enriches the build target information to provide Scala-specific data.. It sounds like this would be a good fit for this endpoint too.

I don't know what you mean exactly by "not currently implement all of BSP" but you can use BuildServerCapabilities to tell clients that some features are not supported by the server and vice versa with BuildClientCapabilities. If there was a capability your server doesn't support, you can announce that and then clients know they cannot query you with certain endpoints. You can also use this same infrastructure to add your own experimental capabilities.

BTW, when the dust settles and the exploratory phase is over, I encourage you to upstream some of the stuff you've worked on to build-server-protocol/build-server-protocol. We could merge this work under a Swift extension and make the necessary changes to the base protocol.

We have tried to make BSP as generic as possible (with lots of concepts from build tools such as Bazel) but it's still quite opinionated and Scala-centric. I'd be really happy to change some parts of the protocol to fit your requirements and be more broadly useful