[Pitch] Speed up protocol conformance checks on first launch using bloom filters

:wave: Great to see all the discussion around protocol conformances here! I'm the author of the blog post that was linked in the original message. At try! swift last weekend I was talking with @ktoso about this proposal and figured I’d leave my 2 cents here.

To start with here’s how I see the current state:

There are a few fast-paths for protocol conformances.

  • The runtime cache which lasts the lifetime of the process and ensures the same type/protocol lookup doesn't do a linear scan more than once.

  • The dyld shared cache which only works for some protocols. I’m not sure about the details here, like if this cache is only for types defined in system dylibs or if it gets invalidated after an app update. This cache is only used on Apple platforms with dyld.

  • The dyld on disk cache which was added in iOS 16 as a way to fix the slow protocol conformances. It’s built on the first launch after an app install in O(n) time, and makes any successful lookup (when the conformance is found) take O(1) time. The cache is built on the first launch but not used until the process is restarted. It’s also only on Apple platforms.

In most cases, these optimizations prevent the O(n^2) slowdown that I talked about in the first blog post. However, there are still some times where the slow path is taken:

  • The first launch of an app. In this case the dyld on disk cache is built before the app’s main function is called, but the cache doesn’t get used during that launch.

  • When a looked up conformance is not found. This won't have a result in the dyld on-disk cache and therefore need one of the other caches. In practice I’ve seen this cost only be paid on the first launch, so I think the dyld shared cache must be handling these.

  • The first lookup during a process lifetime on a platform without dyld, and on simulators where the dyld cache is disabled.

My main takeaway from this current state is there are a lot of different paths taken to solve the same problem, and it would be nice if the problem could be solved as few times as possible. It gets tricky to reason about performance with so many different possible states.

So here are some things to consider as alternatives to improve performance:

  • Update the dyld on disk cache to work during first launch. The cache is already built when the app first launches, so it can be used. It's just an implementation convenience in dyld that it doesn’t get used. I know dyld features are outside the scope of this forum, I filed a feedback report(FB11955902) for this during WWDC 23. I have also made a proof of concept of this myself as part of a test branch in ZConform. This uses fishhook to intercept _dyld_find_protocol_conformance_on_disk and redirect it to a cache I build on app launch. Essentially the same as the dyld hash table but available immediately on first launch. I use this in simulators when running UI tests on apps to speed up the test execution time (since the dyld cache is disabled on simulators). This is for my company’s Snapshot Testing feature. Theoretically it could be used in production as well if you’re comfortable including fishhook.

  • Support _dyld_protocol_conformance_result_kind_definitive_failure in the dyld implementation. Right now the dyld cache doesn’t get used if the conformance is not found, but if the cache could prove the conformance is not possible it could speed up this case. I have also implemented some of this in ZConform by supporting a definitive failure if the class has no superclass or if the type is a value type. If you check for a conformance with ZConform and no record is found, the fast path is still taken. However it can be further improved by supporting generic constraints that make conformances not possible. I also have a feedback report for this(FB11954947).

  • Support faster lookups without dyld. I don’t think this is the main motivation for this post because the original use case is an iOS app, but it is one that came up at try! swift for Wasm apps. I think the model used by dyld (the runtime providing a hook to customize protocol lookups) is a convenient way to support these kinds of improvements. Could this hook be generalized for non-dyld platforms to provide their own fast paths? This would allow the performance improvements to be ported outside Apple platforms without anything extra added to the runtime that would be unused on iOS.

So just a few other ideas, overall great to see discussion on this point! I think it’s not fully solved yet and it would be great to tie up these loose ends!

14 Likes