10 years on, what would you change about Swift?

Joe_Groff · May 10, 2024, 5:02pm

Lest another aspiring language designer look at Swift and see no hope in the separately-compiled generics model, there are a lot of opportunities for major improvement within Swift's implementation, and a from-scratch implementation could do a lot better informed by some of the tradeoffs we made in our implementation. The essential overhead of an unspecialized generic is not much different from a class with virtual methods—you pass a pointer to the value, along with a vtable containing function pointers to all of the methods for the protocols the generic value is required to implement. Swift introduces overhead on top of that basic model for a number of reasons, including:

We require globally-unique metadata records for every type and protocol conformance, out of a combination of needing to interoperate with Objective-C's similar model for class objects and our own desire to support richer reflection for all Swift types. Since these records need to be globally unique, they require coordination through the runtime to instantiate, which can be expensive, and they need to contain every bit of information a Swift program could conceivably ever ask about the type. In the common case, though, if you're just invoking other protocol methods on the value, you don't need most of that metadata, so defaulting to a less-reflectable model for generics might've been a better choice. The need for globally unique metadata also complicates our ability to pre-instantiate metadata records even when we know statically what generic types will be used, since the runtime needs to be aware of the pre-instantiations in order to register them. It would be worth experimenting whether the overall system memory usage cost of non-unique metadata records (and added overhead of metatype equality and other operations) is worth the savings of not having to have unique records.
We never specialize protocol witness tables for generic types, so as soon as you hit unspecialized generics at any level, you pay for unspecialization at every level—so even if we know you passed a concrete Set<Array<(Int, Int)>> as a Sequence, through the abstract Sequence interface, we operate on a unspecialized Set<T: Hashable>, which in turn deals with an abstract Array<T> through the Hashable abstract interface, which in turn forwards to the abstract (repeat each T) implementation of Hashable. This ties in somewhat with the uniqueness requirement above—it would be better to instantiate witness tables for the specialized instance at point of use.
(Until recently) every Swift type is implicitly copyable, movable, and destroyable, and the compiler implicitly uses these operations. Our initial ARC optimization approach was informed by the ObjC ARC optimizer, but it should've arguably been ownership-based from the start, and although we've since switched to "OSSA" SIL for most types, we can't be aggressive as we'd like to be in some cases because of existing code relying on implicitly-extended lifetimes, and unspecialized generics also still don't get to benefit from OSSA at all. Having a better optimizer, and maybe better user control over where implicit copies are allowed to occur, would help with that overhead.
Representing the core copy/move/destroy operations as open-coded functions with a "value witness table" to dispatch them is also a major code size cost paid for every type, for code that is somewhat fundamentally going to be pretty slow. We've been working on an alternative "value witness bytecode" which represents the type layout abstractly as a string, which is interpreted by the runtime to know where to retain/release pointers and do other copy/move/destroy work; not only is this much smaller, but it's also actually faster in a lot of cases in our experiments.

We still have room within the existing ABI to eventually realize a lot of these gains, but it does take longer having to retrofit them within the existing system. I don't think allowing implicitly unspecialized generics on ABI boundaries was necessarily the wrong default, but there are definitely a lot of things about the implementation we can do better. We should also generally have a more robust cross-module optimization model for source-based projects that don't care about ABI.