Sourcekit-lsp and the optimal index store size for efficient go-to-def latency?

Hello, a quick question since I was unable to really find an answer to this question.

Is there some kind of optimal, or the max supported index-store size for efficiency? I know one of the limitations in xcode is that it can't support index / go-to-defs for huge monorepos (hence the usage of focus targets). I am currently using vscode but it has a huge index size of the project if I were to build all of them. My underlying assumption / theory is that it can only be efficient up to x size index store. E.g. if my --index-store-path is set to /foo/bar/indexstore and that path contains about 10 gigabytes worth of data, I'm assuming general lsp requests like go-to-defs etc will face a decent amount of latency. My question is:

  1. Do we know what should be the "max size" for an index store in order to have the most efficient lookups for go-to-defs?
  2. By having a huge size index store, does some underlying mechanism in sourcekit-lsp to continue to make it use up extra cpu? I've found that it seems to continue to increase in cpu usage, and am wondering if its somehow retaining a lot of things due to the big index store size.

Thanks in advance!

Also for additional context, with a size of 11~17 GB index store sizes, I've discovered something with sourcekit-lsp seems to cause swap paging to increase tremendous amounts. Is this an expected behavior with large index store sizes?

It gets so bad to the point that my machine instance will begin to freeze up.

In general, there shouldn’t be a limit to the index store that should be supported. One thing that might be worth noting is that there are multiple levels to the index:

  • The data is stored in record files inside index/store/v5/records. Having many potentially old record files in this directory should not pose an issue.
  • These record files are referenced by unit files in index/store/v5/units. Only the record files referenced by unit files should be relevant for the index.
  • And then there’s indexstore-db which effectively forms an index on top of the index store (aka the record + unit files) and allows us to look up which unit and record files contain references to which USR etc.

The indexstore-db is built by scanning through the index store when you launch your editor and the database is memory-mapped, IIRC. This raises the following questions:

  • Are all the record files referenced by unit files or are there dead files in there?
  • How big is the indexstore-db. I could imagine that you get to surprising behavior if the indexstore-db approaches the available memory on your system.
  • Could you provide a sample of SourceKit-LSP while it’s using CPU (run sample sourcekit-lsp)?
  • Does SourceKit-LSP eventually calm down (I would expect some initial CPU usage as it builds the indexstore-db but for it to calm down once that is done and the database persisted to disk from that point onwards).
1 Like
  • Are all the record files referenced by unit files or are there dead files in there?

The files are quite fresh, so there should be close to no dead files in there

  • How big is the indexstore-db. I could imagine that you get to surprising behavior if the indexstore-db approaches the available memory on your system.

Is there a separate indexstore-db size? The 17 gigabytes is a result from summing up all the files in index/store so that includes v5/units and v5/records

  • Could you provide a sample of SourceKit-LSP while it’s using CPU (run sample sourcekit-lsp)?

Unfortunately this is not on macos, so I can't run sample. Is there a different one you would like to see?

  • Does SourceKit-LSP eventually calm down (I would expect some initial CPU usage as it builds the indexstore-db but for it to calm down once that is done and the database persisted to disk from that point onwards).

It runs for ~10 minutes after opening a swift file with a huge amount of IO, dies down, but restarts later for some reason. It registers almost ~200GB to the write-bytes field in /proc/${pid}/io but not really sure where thats going.

Is your project large? Probably millions of lines of source code?

yup, the project is large

How do you provide build settings to SourceKit-LSP. That defines where indexstore-db is saved. A BSP server would set the indexstore-db using indexDatabasePath in the initialize request, for compile commands, the index database is in IndexDatabase next to the index store and for SwiftPM, it’s in .build/debug/index/db. That size would be interesting to know as well.

There is [6.0] Use an `AtomicInt32` to count `pendingUnitCount` instead of using `AsyncQueue` by ahoppen · Pull Request #1744 · swiftlang/sourcekit-lsp · GitHub, which caused huge CPU usage and that might affect you. That issue is fixed in recent Swift 6.0 or main development snapshots but hasn’t made its way into a release yet. Could you try if you’re seeing the same CPU usage with a toolchain snapshot from Swift.org - Install Swift?

If you’re still seeing the issue in the development snapshots, do you know if there’s an equivalent to sample on Linux? Or, could you build sourcekit-lsp from source and attach a debugger? I assume that we’re busy building the indexstore-db from the index store but that would be good to confirm.

this is using the perf command in linux. We are actually using sourcekit-lsp with the main development snapshot I believe. I didn't include the entirety of the report for obvious reasons, and I only recorded it for about a min+, but some of these numbers do seem very high in terms of cpu usage (or would you say they are expected)?

also fwiw, sourcekit-lsp gets initialized upon opening a swift file.
I've monitored the db size, and it stops at ~3.1G while the index store has ~14G.
However, as I open more random swift files, the indexstore-db seems to increase in size ~0.1 to ~0.2G. Is this an expected behavior? I was under the impression that it scans through the indexstore to build out the db, and that technically the db shouldn't keep getting bigger if nothing changes in the index store. I've seen the index store-db size increase up to 4.5 (and continue to increase randomly) for example.
FWIW, me determining it has finished is if I don't see a change in size of the db for a bit, so this may not be the most accurate method of doing so. Please advise if there is a better way to do this.

I've attached another sample of the perf captured, seems to be pretty consistent in terms of cpu usage and the symbol matching.


Also fwiw, noticed something like this today

$ perf record -F 99 -p 1721559 -g
WARNING: Ignored open failure for pid 1774882
WARNING: Ignored open failure for pid 1774971
WARNING: Ignored open failure for pid 1775026
WARNING: Ignored open failure for pid 1775028
WARNING: Ignored open failure for pid 1775036
WARNING: Ignored open failure for pid 1775039
WARNING: Ignored open failure for pid 1775040
WARNING: Ignored open failure for pid 1775041
WARNING: Ignored open failure for pid 1775042
WARNING: Ignored open failure for pid 1775043
WARNING: Ignored open failure for pid 1775044
WARNING: Ignored open failure for pid 1775050
WARNING: Ignored open failure for pid 1775051
WARNING: Ignored open failure for pid 1775054
WARNING: Ignored open failure for pid 1775055
WARNING: Ignored open failure for pid 1775118
WARNING: Ignored open failure for pid 1775120

but I haven't been able to track down what those pids represent, as ps aux | grep <pid> doesn't seem to return what processes those are.

Edit: I've also noticed that the sourcekit-lsp PID had changed when I checked the machine later. I had done nothing with the machine up to this point, and index-db is now at 5.4 GB. I've checked to see if the sourcekit-lsp trace says anything, but nothing seems to indicate it had crashed (Notice the timestamps) so not sure why the pid would change?

[Trace - 2:10:54 PM] Received response 'textDocument/diagnostic - (116)' in 621ms.
Result: {
    "items": [],
    "kind": "full"
}


[Trace - 5:31:19 PM] Sending request 'textDocument/documentHighlight - (118)'.
Params: {
    "textDocument": {
        "uri": "file:some/file/here"
    },
    "position": {
        "line": 44,
        "character": 29
    }
}

This is the perf report at this point

Edit 2: This is the state at 6.6 gb of indexstore-db, but it took a very long time to get to this point (at least 3 hrs + it seems) I guess I was wrong initially about the size being ~3.1G, and that it was still in the process of building the db without even without me opening up the swift files.

Currently this is the stagnant state, the db doesnt seem to increase in size anymore. Is the sourcekit-lsp cpu usage normal at this point?

Extra updates, but yes, it does seem like sourcekit-lsp does crash. It gets reinitialized again, and beings to re-write the indexstore-db.

For example, I had a case where I was observing the indexstore-db to be at about 5.1 G. Soon after, my machine froze for a bit, and I noticed the mssg initializing sourcekit-lsp which most likely indicates that it had crashed. I check the size of the indexstore-db again, and it was at 384 M, so it had removed all the previously written indexstore-db and started writing a new one from scratch. Is this a configuration issue, or is writing the indexstore-db from the point it was stopped something that is not supported?

This also makes me want to ask, is there a way to tell sourcekit-lsp to not build the indexstore-db if there is already an existing indexstore-db? (For a use case e.g. we have already pre-processed the entire index store / indexstore db building for a machine and want to later import it)