I'm looking to optimize performance of indexstore-db based on two FIXMEs in the code that appear in the code. I'm looking for advice on how to implement (or request help implementing) them.
Goal: I would like to be able to find callers - transitively - of a given set of symbols, to identify impacted codepaths for a change.
The Problem I'm Trying to Solve: Linear performance of forEachSymbolOccurrence. (For example, a shared singleton that appears 200 times will take roughly 200 times than an accessor that's called just once.)
The first mention of optimization is a comment suggesting libcache. (StoreSymbolRecord.cpp:196, in a method named StoreSymbolRecord::doForData.)
Another comment that notes that a loop could terminate earlier if not for multiple symbols appearing with differing roles. (120 lines below the first one.)
Is libcache still a valid approach to optimize this code? Which class would host the cache? (Either a symbol record or the index implementation?)
Can we resolve the "other roles" problem by looking up all roles for a USR at once?
What other ideas/suggestions might folks have to improve performance of this code?
Thanks!
What I've tried
Implementing foreachSymbolCallOccurrence in Swift.
Exposing the C++ implementation of foreachSymbolCallOccurrence to Swift, to rule out performance problems in my implementation.
Profiled the Swift-C interop layer and began exploring withoutActuallyEscaping.
I profiled my code that calls into indexstore-db hundreds of times, with various signposts. I have at least a few dozen runs that show a bottleneck in forEachSymbolOccurrence.
I've tried recursively using foreachOccurrence(relatedToUSR: but the related symbols didn't seem to contain the relations I was looking for. Performance was also not any better.
Building my own caching and adding various optimizations in my code that calls into indexstore-db.
There's a noCache argument in a call to Reader.foreachSymbol in StoreSymbolRecord, line 281. Setting it to false appeared to have a minor improvement, but not nearly enough.
None of these changes overcame the bottlenecks I was encountering. Then I noticed these comments and began to wonder if either or both of them could help.
I'm not sure we're going to be able to do much better than linear here, unless we can terminate the loop early. But IIUC your use case correctly, we cannot (since you want the full transitive closure).
Is libcache still a valid approach to optimize this code? Which class would host the cache? (Either a symbol record or the index implementation?)
The cache might help with subsequent requests accessing the same record (depending on what we actually cache), but not for a single request. So depends on whether your actual goal is a single request or many.
libcache is also referring to macOS's cache.h, so we'd also need another solution on other platforms. Rather than libcache, it could probably just be something like an LRU cache.
I would also assume that in these cases the record is already in-memory much of the time anyway, so I'm not sure how much this would actually end up helping with performance.
Another comment that notes that a loop could terminate earlier if not for multiple symbols appearing with differing roles. (120 lines below the first one.)
Are you seeing foreachSymbolOccurrenceByUSR in your traces? I wouldn't expect this to contribute much (compared to the loop over occurrences).
What other ideas/suggestions might folks have to improve performance of this code?
I assume you're already limiting your occurrences lookup to the relevant roles.
@bnbarham I did see that you’ve signed off of the IndexStoreDBproject, so thank you for your work on that. In the interest of closing the loop, here’s a reply that you deserved a long time ago:
The benefit is that not only am I doing a full transitive closure for a set of symbols, but I’m doing multiple transitive closures which overlap with each other. A cache enables early terminations when symbols are related to each other (methods contained by the same class will overlap along the “contained by” path, but differ along the “called by” or “overridden by” paths.)
In my case, there are many requests.
I definitely saw these in my traces, but it was the sheer number of calls which amplified the relatively small weight of each individual call.
I’ve added caching in the client/application layer. That has, in my testing, helped significantly. It’s been a while, so I don’t recall but it was the difference between minutes to several seconds.