Semantic token highlighting

maciekp · June 10, 2020, 11:41am

Hey, i'm doing a follow up after initial review of my semantic token highlighting implementation here : Implementation of semantic highlighting by prostakm · Pull Request #279 · apple/sourcekit-lsp · GitHub.
I'm going to use now response captured in handleDocumentUpdate to extract semantic tokens, but the issue is that textDocument/semanticTokens request is, well, a request while notification_documentupdate is a notification, this generates a lot of asynchronicity to handle, and can lead to desynchronization of what is used to generate semantic tokens and current document state. For now i'm using simple caching mechanism by storing semantic tokens whenever handleDocumentUpdate is triggered, and then getting those results when textDocument/semanticTokens is requested to populate full semantic tokens request response (i've noticed that notification_documentupdate came before textDocument/semanticTokens request [except for the first one] - is that always a rule?). It seems to work well for now, but I've got few questions, as this solution seems suboptimal and error prone:

Is notification_documentupdate guaranteed to trigger before textDocument/semanticTokens request from client?
Is there any better way to synchronize notification_documentupdate with textDocument/semanticTokens request? I was thinking about partial results, but I'm not sure if this is a case for that (Redirecting…)
Any other ways to handle that? Ideally, there could be a way to capture all needed data (syntax & semantic tokens) with single ad-hoc, sourcekit query whenever textDocument/semanticTokens is received.

blangmuir · June 11, 2020, 5:20pm

That sounds like the right solution to me. The tokens you receive here will be correct for whatever the current state of the document is at the point you retrieve them, i.e. they will match the document contents in self.documentManager, which are also the contents visible in sourcektid.

No, after opening a document (textDocument/didOpen) we will open the document in sourcekitd, and kick off building an AST. We get the notification_documentupdate asynchronously when the AST finishes. I don't think anything prevents the semanticTokens request from coming in before that. If you're using a small test project to test, building the AST is probably just really fast.

Interesting, is there no way in the proposed API for the server to notify the client when new semantic tokens are available?

Some possible approaches

The simplest implementation is to immediately return whatever the currently cached semantic tokens are.
- Pro: Simple, fast.
- Con: if we're in the middle of building the AST, tokens may not be updated until the next request. I don't know how VSCode schedules these requests, but presumably it will be when you make the next edit to the file.
You could try to keep track of "in-progress AST builds", and if you get a semanticTokens request, block until the notification for the most recent AST comes back, if necessary.
- Pro: tokens are always up to date in the response
- Con: it's unclear to me what effect this will have on the UI if semanticTokens is blocked. Also, if there are more edits, will the previous request be canceled? We don't currently implement cancelation.
- Con: keeping track of "in-progress" is much harder than it looks, and probably would need new APIs in sourcekitd to do it accurately. There is no guarantee that every edit will get its own AST build (they could be coalesced), and if something goes wrong (e.g. we couldn't compute compiler arguments) you might never get a notification at all.

My recommendation would be to use (1), which sounds like what you are doing so far, and see how well it behaves under load. If you want to see how the latency effects the user experience, you could modify the code so that when notification_documentupdate is received it does a queue.asyncAfter(...) instead of retrieving the state immediately. By changing the delay you can see what it would be like if the AST took longer to build.

@Nathan_Hawes, any thoughts?

Ideally we want to provide deltas instead of the full set of tokens as much as possible, so keeping track of the current state seems important anyway. I'm not sure how this would help.

akyrtzi · July 28, 2020, 10:47pm

Hi @maciekp, thank you for your work so far! I was wondering if you still have time to pursue semantic highlighting further? If not we may consider continuing on top of your open PR.

maciekp · July 30, 2020, 8:51am

Hi @akyrtzi, I was a little busy lately but I had a working implementation based on Ben's comments on my local copy, I've got stuck a little bit on testing though (it become a little bit non trivial to write tests with async code + dependency to notifications), I will try to push it this week. I really want to see it through to the end but I'm not sure about my availability . If that's ok, in case I won't be able to spare any more time continuing this, I will let you know on beginning of next week

akyrtzi · July 30, 2020, 4:10pm

Thanks for the update!