I have a design sketch but haven't done anything regarding an implementation. Here's what I was thinking...
SE-0409: Hiding implementation details from the compiler
As noted in SE-0409, one of the challenges of internal
imports is that the compiler currently depends on transitively loading any internal
imports of a loaded non-resilient module to generate code. For example, consider these three modules:
// module Utility
public struct X {
var x, y: Int
public init() { ... }
}
// module Library
internal import Utility
public struct Y {
var x: X
public init() { ... }
}
// module Client
import Library
func dup(_ y: Y, count: Int) -> [Y] {
Array(repeating: y, count: count)
}
Within module Client
, the compiler needs to have information about the layout of Y
to be able to allocate space for it, perform copies, destroy instances, and so on. For non-frozen types in a resilient module, this information is evaluated at runtime using the value witness table. For frozen types and those in a non-resilient module (the common case), the compiler depends on having information about all of the instance properties of the type to generate this code. This requires the compiler to “see” the property Library.x
(even though it is not visible to the user in module Client
) and through the internal import Utility
to determine the storage of X
.
Hiding implementation details
The difficulty with internal imports comes from the need to hide the implementation details of transitive dependencies from the compiler. To that end, we propose to include abstract layout information for all used types as part of a compiled Swift module file. Abstract layout information provides sufficient information to reason about and manipulate the storage of a type even when it is impossible to reason about the contents of the type directly.
For presentation purposes, we’ll express abstract layout information as annotated Swift code, although it could be any representation produced and consumed by the compiler. For each public type within a library, we can annotate it with abstract layout information that shows all of its structure. Let’s consider a module that depends on both a C type and a type from an internally-imported Swift module
// Utility.h
typedef struct {
int x, y;
} X;
// Module A
public struct Y {
var name: String
}
// Module B
// -internal-import-bridging-header Utility.h
internal import A
public struct Z {
private var x: X
var y: Y
public var weight: Double
}
The abstract layout information for Z
would be as follows:
@layout(size: 32, alignment: 8, stride: 32, bitwiseCopyable: false)
public struct Z {
@layout(offset: 0)
private var x: @_hiddenType("$s3__C1XV")
@layout(offset: 8)
var y: @_hiddenType("$s1B1YV")
@layout(offset: 24)
public var weight: Double
}
The @layout
attribute describes known layout information for the various types and fields. For types, it provides the size, alignment, and stride that are needed to correctly allocate storage for an instance of the type. It can also include other characteristics that can affect code generation, such as whether the type is bitwise-copyable, and we could expand this set of information over time.
Note that private and internal fields are represented in the abstract layout that is exposed to clients. However, the types of these fields are abstracted away via the @_hiddenType
attribute, because clients cannot necessarily resolve the actual types. @_hiddenType
uses a string representation (here, a mangled name) to provide that level of indirection while still maintaining a notion of type identity.
Hidden types
Hidden types that are needed to describe the representation of public types are also emitted, transitively. These use the same abstracted structure, but are identified by their string representation:
@layout(size: 4, alignment: 4, stride: 4, bitwiseCopyable: true)
struct $s3__C1XV {
@layout(offset: 0)
var x: CInt
@layout(offset: 4)
var y: CInt
}
@layout(size: 16, alignment: 8, stride: 16, bitwiseCopyable: false)
struct $s1B1YV {
@layout(offset: 0)
var name: String
}
Note that these representations are emitted as part of module B
: they restate knowledge that B
has about its internal dependencies, but in a manner that is abstracted so its clients can reason about the layout without loading B
’s internal dependencies directly.
Dynamic layout and resilient types
The examples above use types that are hidden, but whose underlying storage is still fixed at compile time and known to the compiler. When a type uses a resilient type as part of its storage, the underlying storage is no longer of fixed size. As such, many aspects of its layout are dynamic. For example, Foundation.URL is a resilient type on platforms using ABI stability. Consider a variant of the prior example that embeds a URL
:
// Module B2
// -internal-import-bridging-header Utility.h
internal import Foundation
public struct Z2 {
private var x: X
var y: URL
public var weight: Double
}
Now, most of the aspects of the layout of Z2
are unknown at compile time. For exposition purposes, we represent this in @layout
with the dynamic
keyword:
@layout(size: dynamic, alignment: 8, stride: dynamic, bitwiseCopyable: dynamic)
public struct Z {
@layout(offset: 0)
private var x: @_hiddenType("$s3__C1XV")
@layout(offset: dynamic)
var y: @_hiddenType("$s10Foundation3URLV")
@layout(offset: dynamic)
public var weight: Double
}
The dynamic
layout can also be used for members of generic types where the layout itself depends on the instantiated types. The compiler itself may be able to reason about the layout of a particular specialization (e.g., Pair<Int, String>
) if it knows the layouts of the argument types.
For URL
itself, we’ll have an empty definition whose layout is specified as “opaque”:
@layout(opaque, size: dynamic, alignment: dynamic, stride: dynamic)
struct $s10Foundation3URLV { }
For resilient types, the layout cannot statically be known to clients, because it could change without recompiling clients. The opaque
designator specifies that the compiler should generate code that uses the value witness table to allocate and manipulate instances of the type. This is already the case with uses of resilient types within clients: the purpose of the opaque
layout is to indicate which types need this treatment.
Note that the use of mangled name as the string identifying the layout type is now significant, because it provides the basis for emitting calls to retrieve the value witness table. For example, given the name $s10Foundation3URLV
, we can form a call to the type metadata accessor $s10Foundation3URLVMa
, which then provides the value witness table, and we can do this without ever resolving the type Foundation.URL
in the client.
We could use a different string to identify the type, and embed the mangled name in the opaque
layout description, e.g.,:
@layout(opaque("$s10Foundation3URLV"), size: dynamic, alignment: dynamic, stride: dynamic)
struct some_mangled_name_for_Foundation_URL { }
Opaque non-resilient types
In some cases, it can be valuable to get the compiler to treat types as being opaque
even when they are non-resilient. This can be important when the type has some aspects that are known at compile time (e.g., size and alignment, offsets of various fields) but have semantics that the client cannot replicate without deep knowledge of the types involved. For example, this can occur with uses of C++ types that have non-trivial special member functions:
// MyCppLibrary.hpp
#include <string>
class MyString {
std::string stored;
};
// module CppAdapter
// -internal-import-bridging-header MyCppLibrary.hpp
public struct W {
var name: MyString
var weight: Double
}
The C++ MyString
type has nontrivial copy constructor, move constructor, destructor, and so on. A client of the CppAdapter
module therefore cannot copy, move, or destroy an instance of W
without being able to generate calls into those C++ special member functions to operate on the name
field. And calling those C++ functions requires the client to both be building with C++ interoperability enabled and to have imported MyCppLibrary.hpp
itself, which doesn’t work.
Instead, we emit an opaque layout for MyString
along with supporting functions inside CppAdapter
. The abstract layouts look like this:
@layout(size: 32, alignment: 8, stride: 32, bitwiseCopyable: false)
public struct W {
@layout(offset: 0)
var name: @_hiddenType("$s5__Cxx8MyString")
@layout(offset: 24)
var weight: Double
}
@layout(opaque("$s10CppAdapter$s5__Cxx8MyString"), size: 24, alignment: 8 stride: 24, bitwiseCopyable: false)
struct $s5__Cxx8MyString { }
Note that the abstract layout of the struct W
is fully-determined, because the size of MyString
and Double
are both known at compile time. Indeed, the abstract layout for MyString
specifies size, alignment, and stride because its layout is known at compile time as well.
However, MyString
is marked opaque
because only the CppAdapter
module we are in is guaranteed to be built with C++ interoperability and able to form calls to its special member functions. When compiling CppAdapter
, the Swift compiler will need to emit metadata for MyString
that is similar to what would be emitted for a resilient type, e.g., a value witness table or type metadata accessor to retrieve that value witness table. These symbols need to be publicly accessible, because a client of CppAdapter
that needs to emit a copy of a W
instance will have to reference them as part of copying the name
field.
Note that many modules in a program may have an include of MyCppLibrary.h
and a use of MyString
that produces an abstract layout. To prevent symbol collisions when those modules are part of the same program, the opaque
designator provides an alternative mangled name that includes both the name of MyString
and the name of the module that is using it. Therefore, the set of emitted symbols for accessing the value witness table will be unique across the program. There are future optimizations here where some central module could opt to provide the value witness table or accessors explicitly, so that clients need not emit redundant copies.
Opaque layouts can also be used to “cut off” recursion into the structure of internally imported types. For example, we could chose to use an opaque layout for every type that is imported from an internal import, which reduces the number of abstract layouts we need to emit, but can degrade the quality of the generated code because the compile needs to go through the value witness table for opaque types. However, the information behind opaque types can be recovered in some cases, as will be discussed in the next section.
Recovering hidden type information
One of the challenges highlighted earlier in the document is that clients may have to emit larger or less-efficient code to copy values when the details of that value’s type have been hidden. For example, the copy of the type W
in a client needs to call through the opaque value witness table for MyString
rather than directly calling the C++ copy constructor (which could have been inlined).
The abstract layout mechanism allows clients to recover the “hidden” type information when they have access to the type that has been hidden. For example, let’s reconsider one of our earlier examples containing three Swift modules:
// module Utility
public struct X {
var name: String
}
// module Library
internal import Utility
public struct Y {
var x: X
}
// module Client
import Library
func dup(_ y: Y, count: Int) -> [Y] {
Array(repeating: y, count: count)
}
The abstract layouts in Library
could look like this:
@layout(size: 16, alignment: 8, stride: 16, bitwiseCopyable: false)
public struct Y {
@layout(offset: 0)
var x: @_hiddenType("$s7Utility1XV")
}
@layout(opaque, size: 16, alignment: 8, stride: 16, bitwiseCopyable: false)
struct $s7Utility1XV { }
Within Client
, copying an instance of Library.Y
means generating code to fetch and use the value witness table of Utility.X
, which is less efficient (both in code size and run time) than if we knew the layout non-opaquely.
However, if the Client
for some other reason were to import Utility
, it has knowledge of the type Utility.X
that it could use to emit better code. This is the other reason to use some form of mangled name for hidden types: $s7Utility1XV
uniquely identifies the type Utility.X
in a Swift program. If the Utility
module is available, the Swift compiler can resolve the hidden type $s7Utility1XV
to the actual type Utility.X
and implement the more efficient code path. The same approach can be applied to imported C++ types when in a module that itself has enabled C++ interoperability, subject to the C++ One Definition Rule (ODR).
This optimization approach allows clients to mitigate the code size and runtime performance cost of hiding implementation details by explicitly introducing import
dependencies on the modules that define these “hidden” types. The introduction of this import (or #include
in a bridging header) might uncover modularization-related issues that require fixing to get the optimization back.
Doug