[Pitch] Speed up protocol conformance checks on first launch using bloom filters

Joe_Groff · March 6, 2024, 1:37am

I wonder if we could at least dedicate a translation unit to producing the protocol conformance lookup table (at least in non-WMO mode; in WMO mode, we should supposedly have whole-module information enough to build the whole conformance table in one go). We might still need a secondary data structure to indicate the span of the module's conformance table within the image section, since someone could still link multiple modules into one image, but being able to binary-search or do something even better at a module level would still likely be an improvement, even if we have to linear-scan for the module first (since there should generally be an order or magnitude fewer modules in a binary than conformances, I would think).

jrose · March 6, 2024, 3:18am

With mergeable libraries (and just plain static linking) multiple modules in an image becomes the common case. And since we don’t know which module to find a conformance in, there’s still gonna be a scan there. But yeah, a linear check in the number of modules-containing-conformances is still better than a linear check in all conformances.

tera · March 6, 2024, 8:25pm

As for using bloom filters themselves I have to bump up this question:

benpious · March 7, 2024, 5:49am

When there’s a false positive the existing linear search would be executed for that segment. The filter would have been an optimization to potentially avoid this work, nothing more.

tera · March 7, 2024, 2:10pm

Can you expand on how exactly is this done?
I have this pseudo code in mind but I don't see how to cope with false positives:

func swift_conformsToProtocol(thing: ...., ...) -> Bool {
    if bloomFilter.contains(thing) {
        // fast case, no need to the linear search
        // but... there could be false positives 🤔
    } else {
        // slow case, need to do the linear search
        // update bloomFilter
    }
}

bbrk24 · March 7, 2024, 3:08pm

This code assumes the bloom filter is either “probable no” or “definite yes.” AIUI, it’s the other way around — it gives you “definite no” or “probable yes.”

benpious · March 7, 2024, 10:07pm

Yeah. The code should be

if !bloomFilter.probablyContains(type) {
       // do the linear search. It may be unnecessary. 
} 
// no else; if we didn't enter the search for this segment, we move on to the next one and repeat the process.

noahsmartin · March 29, 2024, 6:05pm

Great to see all the discussion around protocol conformances here! I'm the author of the blog post that was linked in the original message. At try! swift last weekend I was talking with @ktoso about this proposal and figured I’d leave my 2 cents here.

To start with here’s how I see the current state:

There are a few fast-paths for protocol conformances.

The runtime cache which lasts the lifetime of the process and ensures the same type/protocol lookup doesn't do a linear scan more than once.
The dyld shared cache which only works for some protocols. I’m not sure about the details here, like if this cache is only for types defined in system dylibs or if it gets invalidated after an app update. This cache is only used on Apple platforms with dyld.
The dyld on disk cache which was added in iOS 16 as a way to fix the slow protocol conformances. It’s built on the first launch after an app install in O(n) time, and makes any successful lookup (when the conformance is found) take O(1) time. The cache is built on the first launch but not used until the process is restarted. It’s also only on Apple platforms.

In most cases, these optimizations prevent the O(n^2) slowdown that I talked about in the first blog post. However, there are still some times where the slow path is taken:

The first launch of an app. In this case the dyld on disk cache is built before the app’s main function is called, but the cache doesn’t get used during that launch.
When a looked up conformance is not found. This won't have a result in the dyld on-disk cache and therefore need one of the other caches. In practice I’ve seen this cost only be paid on the first launch, so I think the dyld shared cache must be handling these.
The first lookup during a process lifetime on a platform without dyld, and on simulators where the dyld cache is disabled.

My main takeaway from this current state is there are a lot of different paths taken to solve the same problem, and it would be nice if the problem could be solved as few times as possible. It gets tricky to reason about performance with so many different possible states.

So here are some things to consider as alternatives to improve performance:

Update the dyld on disk cache to work during first launch. The cache is already built when the app first launches, so it can be used. It's just an implementation convenience in dyld that it doesn’t get used. I know dyld features are outside the scope of this forum, I filed a feedback report(FB11955902) for this during WWDC 23. I have also made a proof of concept of this myself as part of a test branch in ZConform. This uses fishhook to intercept _dyld_find_protocol_conformance_on_disk and redirect it to a cache I build on app launch. Essentially the same as the dyld hash table but available immediately on first launch. I use this in simulators when running UI tests on apps to speed up the test execution time (since the dyld cache is disabled on simulators). This is for my company’s Snapshot Testing feature. Theoretically it could be used in production as well if you’re comfortable including fishhook.
Support _dyld_protocol_conformance_result_kind_definitive_failure in the dyld implementation. Right now the dyld cache doesn’t get used if the conformance is not found, but if the cache could prove the conformance is not possible it could speed up this case. I have also implemented some of this in ZConform by supporting a definitive failure if the class has no superclass or if the type is a value type. If you check for a conformance with ZConform and no record is found, the fast path is still taken. However it can be further improved by supporting generic constraints that make conformances not possible. I also have a feedback report for this(FB11954947).
Support faster lookups without dyld. I don’t think this is the main motivation for this post because the original use case is an iOS app, but it is one that came up at try! swift for Wasm apps. I think the model used by dyld (the runtime providing a hook to customize protocol lookups) is a convenient way to support these kinds of improvements. Could this hook be generalized for non-dyld platforms to provide their own fast paths? This would allow the performance improvements to be ported outside Apple platforms without anything extra added to the runtime that would be unused on iOS.

So just a few other ideas, overall great to see discussion on this point! I think it’s not fully solved yet and it would be great to tie up these loose ends!