Eagerly load protocol conformances

Hi! While investigating app startup time I noticed 100s of milliseconds spent in protocol conformance checks, specifically swift_conformsToProtocol. Usually the call stack traces back to swift_getTypeByMangledName or swift_dynamicCastImpl. Looking at the source for swift_conformsToProtocolMaybeInstantiateSuperclasses there are three paths it can take:

  1. A cached result is found in _dyld_find_protocol_conformance
  2. A cached result is found in the ConformanceCache
  3. A scan of all conformances in all dylibs

For some of the apps I’m looking at there are 100k+ conformances and the third path is taking ~2ms. I also measured that every first time check of a type/protocol pair results in a full conformance scan. Only repeated lookups of the same pair result in a cache hit in the ConformanceCache, and I never saw a cache hit in the dyld cache. My assumption was the dyld cache is used across multiple launches of the app to avoid doing the conformance scan again, but this isn’t what I’m seeing in practice (based on placing a symbolic breakpoint in the ConformanceCache).

With this behavior there can still be room for improvement, especially when developers know the kinds of conformance check patterns their code requires. For example, many conformance checks might be done for a single protocol during app launch. This pattern could benefit from spending upfront time to scan all conformances and generate a cache of all types conforming to the given protocol, instead of performing the scan on each conformance check.

Based on this thread there isn’t much I can do about getTypeByMangledName calls, but as? casts could be avoided entirely with this kind of eagerly generated cache. From application code it might be easiest to just store negative entries, and use this cache to skip as? calls when we know it will fail.

In general there seem to be many ways application developers can use the specifics of their apps to optimize this frequently used language feature. Another example could be creating separate collections of conformances for common protocols to avoid scanning unrelated conformances. Is this what the dyld cache is meant to solve, and if so is there any way to enable/test it now? I saw that Swift 5.4 included improvements to the cache hit case, but as far as I can tell the uncached case still requires the O(n) lookup. Is there any reason it can’t be further optimized, or any other tricks people have to improve the performance of this?