Hi all! I wanted to get folks' thoughts on a potential change that could improve the performance of swift_conformsToProtocol
on the first launch of an application, at the cost of a relatively small amount of memory and code size increase.
I’ve not prototyped this at all, mainly because I think it would be pointless to start working without getting the thoughts of folks on the core team and in the community, to ensure that there is no better approach we haven’t considered, or something that prevents a data structure like Bloom Filters from being employed in this situation, like not being able to generate an appropriate hash for a type or something of that nature.
I'm also not sure if this makes more sense in the Development section, feel free to move it there if that is the case.
Motivation
For context on why this is an issue, this blog post gives a detailed description. The TL:DR is that Swift does a linear search whenever it needs to create type metadata for a type.
My employer’s applications have 100,000+ protocol conformance records in one of our binaries. While we’re aware that there is a dyld cache of the conformances, the cadence of our updates mean that the majority of users will have a poor experience once a week. We believe that many other applications which update on a weekly cadence may also be experiencing this issue.
Furthermore, improving the speed of the lookup can also have benefits for engineers who are profiling their applications in instruments. Many engineering teams may be wasting time trying to understand why their application is taking so much time in swift_conformsToProtocol
, when in reality this cost is only paid on the first launch after updating. Conversely, engineers who do know may ignore real performance issues because their application running slowly during development is “normal.”
Improvements could also be beneficial on non-Apple platforms, such as running Swift on a Linux server. While there may not be many conformances in a microservice, the frequency it is updated and deployed likely makes relying solely on a dyld cache impractical, even if it was available on Linux.
High level approach
For every N protocol records (let’s say every 10,000, for now) the Swift compiler would emit a bloom filter in addition to the metadata. So an app with 100,000 protocols would have 10 filters emitted.
When the Swift runtime receives a request for type metadata, instead of naively walking all the records, it would check the bloom filter first. This would enable it to skip large chunks of the conformances entirely, turning a 100,000+ iteration loop into one as small as 10,000 iterations and 10 bloom filter checks in the best case.
This does assume that it is possible to generate a hash for the type; which I have yet to fully investigate, and may be the thing that dooms this proposal.
Memory/code size impact
Let’s say that we are targeting an error rate of 30%, and segmenting every 10,000 records. According to wikipedia, we need (10000 * 1.44 * log2(1/0.3)) / 8 = 3,126 bytes per segment. At 100,000 records, we need about 31k bytes in total.
While this may not be acceptable for some applications, if your application’s code size is already measured in the tens or hundreds of megabytes, it is a drop in the ocean for the resulting performance benefits.
Therefore, this could be an opt-in change. For ABI compatibility, it would probably have to be regardless. If the filter for a particular segment is not present, we could simply traverse it anyway.
It’s also not impossible to imagine that vendors like Apple could treat the filters like a dynamic resource instead of embedding them in the binary, and delete them from the system after the dyld cache was sufficiently populated.