SIL has a lot of incidental complexity arising from representational issues in how we handle the types of polymorphically-callable functions such as protocol witnesses, class methods, and closures. Here is a proposed design for substituted function types, which tries to address some of these issues.
The problem
To implement a polymorphic interface, such as a protocol method or class method, the different implementations have to share a common generic machine-level calling convention, even though each implementation has a different concrete function type derived from the specific parent type it provides an implementation for. For example, if you have a protocol like:
protocol P {
associatedtype A
func foo(x: Self) -> A
}
then every implementation of foo
has to be callable like a <Self> (self: Self, x: Self) -> Self.A
function. A specific type that implements the protocol, such as:
struct S<T, U>: P {
func foo(x: S<T, U>) -> U
}
has an implementation of foo which in isolation has the type <T, U> (self: S<T, U>, x: S<T, U>) -> U
, but which needs to . In SIL, these types are currently represented as:
// For a generic implementation of P.foo
@convention(witness_method: P) <Self: P> (@in_guaranteed Self, @in_guaranteed Self) -> @out Self.A
// For the S<T, U> implementation of P.foo
@convention(witness_method: P) <T, U> (@in_guaranteed S<T, U>, @in_guaranteed S<T, U>) -> @out U
In order for protocol dispatch to work, these two types need to share a calling convention. For the value arguments, although the concrete types differ between the two types, they still match up one-to-one after substitution, and the @in_guaranteed
and @out
SIL attributes indicate that they're passed and returned indirectly by pointer, and this is sufficient to ensure the parameter ABIs line up despite the type differences. However, we also need to pass along generic arguments in a uniform way between these two functions, but there's no direct correspondence in the SIL types to capture the relationship between the single <Self: P>
parameter to the generic interface and the concrete generic environment of the implementation.
We don't want these two functions to have exactly the same type, because the implementation of S.foo
needs the substituted type information in order to be able to express its implementation in terms of concrete operations on S
, but we do need some way to establish the common ABI relationship between them. Today, SIL indirectly encodes that relationship in an ad hoc way. The special witness_method: P
convention decrees that, regardless of its stated generic signature, the function at the machine level receives a single Self: P
generic argument from which it derives all its other arguments.
The witness_method
hack addresses the problem of providing a consistent machine-level calling convention for the different witness methods, but ideally, we'd also have a uniformly-callable representation in SIL. Optimization passes like generic specialization and devirtualization want to be able to substitute types in a uniform way, and to be able to replace dynamic dispatch with direct calls to concrete implementations, and the mismatch between the generic signatures of different protocol method implementations makes the devirtualization part complicated. For example, an unspecialized generic function that calls P.foo
might look like this in simplified SIL:
// Look up the implementation of P.foo for T
%foo = witness_method $T, #P.foo : $@convention(witness_method: P) <Self: P> (@in_guaranteed Self, @in_guaranteed Self) -> @out Self.A
// Invoke it
%result = apply %foo<T>(%self, %x)
After specializing for something like T == S<Int, String>
, we'll have:
// Look up the implementation of P.foo for T
%foo = witness_method $S<Int, String>, #P.foo : $@convention(witness_method: P) <Self: P> (@in_guaranteed Self, @in_guaranteed Self) -> @out Self.A
// Invoke it
%result = apply %foo<S<Int, String>>(%self, %x)
Now we'll want to devirtualize that witness_method
call and replace it with the concrete implementation. Because the generic signature of the concrete implementation in SIL doesn't match the common generic signature of the virtual call, though, we can't just replace the %foo
instruction with a direct reference; we have to also rewrite any apply
instructions that reference the instruction to match the new generic signature:
// Statically reference the implementation of P.foo for S
%foo = function_ref @S.foo : @convention(witness_method: P) <T, U> (@in_guaranteed S<T, U>, @in_guaranteed S<T, U>) -> @out U
// Invoke it. Notice that we have to simultaneously change the generic arguments
%result = apply %foo<Int, String>(%self, %x)
This is tricky to do in the general case, and also means that, if there happen to be any non-apply
uses of the witness_method
instruction, then we can't fully replace the witness_method
instruction, because we don't know how to rewrite arbitrary uses of the value that expect the original type.
This problem also manifests in class methods of generic class hierarchies, because replacing a method reference dispatched from a base class similarly requires mapping base class generic arguments to the arguments on the derived class:
class B<T, U: P, V: Q> {
func bar()
}
class D<W: P & Q> : B<Int, W, W> {
override func bar()
}
// Devirtualizing this:
%bar = class_method %object, #B.bar
apply %bar<Int, String, String>(%object)
// Requires transforming it into:
%bar = function_ref @D.bar
apply %bar<String>(%object)
Proposed solution: substituted function types
Ideally, function types for methods would be able to preserve their uniform interface type while also specifying their specific type information relative to that uniform type. We can extend the representation of SILFunctionType
to support what I'll call substituted function types. A SIL function type would consist of the following bits of data (with the new fields for substituted types bolded):
- The calling convention,
- The interface generic signature,
- The parameters and results, and additionally,
- The substituted generic arguments.
The interface generic signature specifies the calling convention for the function's generic arguments, and the parameters and results are specified in terms of that generic signature. The function can provide substituted generic arguments to note that the generic parameters are bound to more specific types. As a strawman notation, I'll notate this in SIL examples here by specifying the substituted generic arguments after the function type, in angle brackets after a for
token. So if a protocol method has the type <Self: P> (Self) -> Self
(leaving out convention attributes for simplicity), then the type of the implementation given by a concrete type X
would be spelled <Self: P> (Self) -> Self for <X>
. This means that the function is always called like a <Self: P> (Self) -> Self
generic function, even though the arguments are known to always statically be X
. This makes devirtualization straightforward, since the generic and nongeneric functions can now have compatible types:
// Before specialization or devirtualization
%method = witness_method $T, #P.method : $<Self: P> (Self) -> Self for <T>
apply %method<T>(%x)
// After specializing T == X
%method = witness_method $X, #P.method : $<Self: P> (Self) -> Self for <X>
apply %method<X>(%x)
// After devirtualization
%method = function_ref @X.method : $<Self: P> (Self) -> Self for <X>
apply %method<X>(%x)
Inside a function implementation, the entry point arguments and return values use the substituted concrete types:
sil @X.method : $<Self: P> (Self) -> Self for <X> {
// Inside the definition, the arguments are substituted:
entry(%self : $X):
...
// As are the returns
return %self : $X
}
For an implementation on a generic type, the substituted generic arguments need to be able to be specified in terms of another generic signature, so generic declarations also need to be able to specify a substituted generic signature to define their generic parameters as seen from the implementation side. (Thankfully, SIL types themselves don't need this extra information, since they should always be expressible in terms of context types.) For a generic type S<T, U>
conforming to the same protocol, its definition could be notated like this, with the substituted generic signature expressed next to the symbol name in the declaration:
// Implementation function for S<T, U>.method
// The substituted generic signature precedes the `:` for the type. All of its
// declared parameters must be bound in the substituted generic arguments of the
// type.
sil @S.method<T, U> : $<Self: P> (Self) -> Self for <S<T, U>> {
// Inside the implementation, the parameter and result types are still
// substituted.
entry(%self : $S<T, U>):
// The parameters of the substituted generic signature are
// open and can be used in other types
%t = alloc_stack $T
%u = alloc_stack $U
...
return %self : $S<T, U>
}
// ...Elsewhere, to invoke the function, arguments are passed according to the
// interface generic signature
%f = function_ref @S.bar : $<Self: P> (Self) -> Self where <S<String, Int>>
apply %f<S<String, Int>>(%x)
Closures and pointer authentication
Closure arguments to generic functions have the same high-level ABI concern as methods do; a function <T, U> ((T) -> U) -> ()
needs to be able to accept (Int) -> String
, (String) -> Int
, or other matching function types with a uniform ABI and calling convention. Unlike generic methods, Swift doesn't support generic closure arguments, for the most part, this is handled successfully by SIL's support for abstraction patterns and reabstraction to emit closure implementations with the right calling convention and thunk between calling conventions when necessary.
Reabstraction has proven sufficient to ensure that the entry points for different generic closures are calling convention compatible, but if we wanted to change the representation of the closure value itself based on its type, we would still need a way to relate concrete function types to the generic type they're trying to be ABI-compatible with. This is precisely the situation we find ourselves in with pointer authentication on iOS, where we would like to harden the feature by using different discriminator codes to sign function pointers of different function types. With our current SIL representation, we're limited in our ability to do so. If a generic function takes a closure parameter, then any function that can possibly be passed as an argument has to be signed the same way in order to be ABI compatible. For instance, this means that (Any) -> Any
and (Date) -> Date
values must be signed the same way, because they pass their arguments and returns indirectly, and so either could be passed as a generic (T) -> U
argument. This significantly weakens the effectiveness of pointer authentication as a security mitigation for Swift code.
Addressing this representation problem requires addressing the same basic issue that we have with methods. For a closure SIL function type, we want to capture both the common interface type the function value is trying to conform to, as well as the specific type a particular value has. We can apply the same substituted function type concept to how we lower function argument types. Given a function of Swift type <T, U> ((T) -> U) -> ()
, which we currently lower to:
sil @foo : $<T, U> (@callee_guaranteed (@in T) -> @out U) -> ()
we can lower as this instead:
sil @foo : $<T, U> (@callee_guaranteed <A, B> in (@in A) -> @out B for <T, U>) -> ()
When we pass an argument of concrete type to this function of in SIL code elsewhere, we can insert an explicit conversion from the concrete type to the generic substituted type:
%foo = function_ref @foo : $<T, U> (@callee_guaranteed <A, B> in (@in A) -> @out B for <T, U>) -> ()
%arg = function_ref @arg : $(@in Date) -> @out Any
// Explicitly convert the function here. This lets us know we have to sign
// with a more generic discriminator
%arg' = convert_function %arg to $<A, B> in (@in A) -> @out B for <Date, Any>
apply %foo(%arg)
By making the distinction between the concrete function type and the generic type, we're freed up to use a more specific discriminator when we know we're working with the concrete function, because we can change the discriminator to a more generic one at well-defined times when we know re-signing is necessary.
One subtle difference from substituted method function types is that the <A, B>
generic signature is not part of the machine calling convention of the closure, because they will always be bound to specific types in context. The in
keyword after the generic signature is trying to communicate that distinction between an "implied" generic signature and one that has a concrete representation in the machine calling convention.
Discussion
Hopefully that gives an overview of the set of problems we're trying to solve with this change. It is a particularly pervasive change to the SIL type system model that will impact how a good proportion of function types are represented, so I hope this clearly motivates the change. I'm also open to suggestions on how to improve the notation to try to keep the printed representation of SIL as understandable as possible. There may be other approaches to solving these problems that we haven't thought of as well that solve these problems in a more straightforward way. Thank you all for reading and for any feedback you're able to provide!