[Pitch] Allow non-resilient modules to hide dependencies to clients

Background

In the SE-0409 for access level on imports, it was specifically called out that hiding dependencies from client modules built without resilience (-enable-library-evolution) is “possible in theory” but would require restricting available API, that is, the compiler needs to know memory layout of imported types, which may involve sibling internal and transitively-imported types not visible yet to those clients. Relaxing this restriction for modules built without resilience would benefit at least three groups of developers. If you can think of more, comment below:

  1. Apps built with many modules

Building with resilience allows the compiler to insert abstraction thunks into the binary to materialize that layout information for non-public types at runtime but this comes at the expense of binary size, which, for apps built of many modules, is less than ideal.

  1. Apps using cxx interop

As @Alex_L clarified here, using the -cxx-interoperability-mode flag is viral. Consumers who don’t actually want to leak this implementation detail, especially when building apps composed of large modules (see 1) would benefit from being able to consume modules built with cxx interop and not have to enable it for the rest of the dependency graph.

  1. SPM packages that would like to hide implementation details to consumers without using resilience

I’m less familiar myself with SPM but as @jrose mentioned here I gather that’s not supported with enable-library-evolution.

These use cases have previously relied on @_implementationOnly import, but those now produce diagnostic warnings since release 5.10 (cc @xymus). Unfortunately this commit somewhat contradicts Jordan’s comment linked above but the compiler is not able to generate safe code without the proposed feature.

Proposal

So I’m proposing new compiler flags that can help diagnose potentially leaking API with code currently using @_implementationOnly imports and eventually moving to a new resilience mode “fragile” for this use case, which would enable these diagnostics as well.

Implementers could follow the same path, enabling the diagnostic flag to clean up leaking API before enabling fragile libraries which would actually hide those dependencies. This means in practice that fragile libraries would require that public structs only have stored properties of publicly visible types, either publicly imported by the defining module or public sibling types of the struct. The same rule applies to fixed layout classes but I assume those are more rare.

Add diagnostics

Add flag to diagnose potentially leaking interfaces -diagnose-escaping-implementation-only-properties

  1. Public structs
1. Error: Internal, private or fileprivate access level stored properties of privately imported types
2. Error: Internal, private or fileprivate access level stored properties of sibling internal or private types
3. Okay: Internal, private or fileprivate access existential properties or opaque privately imported concrete type is okay through runtime unboxing
  1. Public fixed_layout classes
1. Same rules as structs above
  1. Private usableFromInline functions or types used in public types
1. Parameter and return values must be of equivalent access control as inlining function
  1. Note: public enums with associated values are handled by access control

Can be used with default or resilient resilience strategies

Hide dependencies

With memory layout details diagnosed with -diagnose-escaping-implementation-only-properties, internal and private imports can be treated as implementation-only de jure, that is, that their imports are not serialized into the generated swiftmodule or swiftinterface files as required imports, and as long as the dependent modules objects are available at link-time, a binary can be built with those implementation details

-enable-fragile-library

Mirrors -enable-library-evolution but for non-resilient modules. This flag should be used by developers who own and recompile all of their code when building an app made of many modules. HIding dependencies is great for improving build times, trimming dependency graphs, and encapsulating implementation details, as well as speeding up compilation.

  1. When actually hiding these dependencies, missing memory layout information leaked will cause an unrecoverable error. This is the main desired behavior change over the existing (lack of) warnings from implementation-only imports

  2. Adds fragile resiliency strategy to swift::ResilienceStrategy in AST/Module.h

  3. Naming inspired from the archived Resilience.rst document and struct FragileFunctionKind from AST/DeclContext.h

  4. Prevent writing of privately (internal or private) imported modules to generated module files (.swiftinterface and .swiftmodule) when building with fragile resiliency

  5. Changes in lib/Serialization/ModuleFileSharedCore.cpp

Future:

Collapse -enable-library-evolution and -enable-fragile-library into a -resilience-mode flag, but with the amount of code out there with -enable-library-evolution that may need to exist for a while

Remove @_implementationOnly entirely. With resilient modules already warning to use internal imports, non-resilient modules could emit diagnostics to enable fragile resilience strategy and a fixit to move to internal import as well (or a naked import if building with InternalImportsByDefault)

5 Likes

Implementation here: prevent modules from leaking internal and @_implementationOnly imported fields in public structs by AdamCmiel · Pull Request #77194 · swiftlang/swift · GitHub

1 Like

Hey Adam, thanks for working on this future direction of SE-0409.

If I understand your proposal correctly, it leverages the existing type checker infrastructure for diagnosing unmet requirements for @frozen types and, in a new special mode enabled by -enable-fragile-library, diagnoses all types that are exposed to clients of the module as if they were @frozen, therefore requiring that all layout affecting properties come from exposed dependencies.

I think this is a clever way to enable some Swift libraries to opt-in to avoiding exposure of their non-public dependencies when compiled non-resiliently. However, it doesn't feel to me like a viable approach for Swift to adopt officially because it creates a resilience mode based language dialect. I don't think it would be acceptable for the language to conditionally reject this code based on the presence/absence of -enable-library-evolution or any other similar, official flag:

internal import Hidden

public struct Exposed {
  // Since this property satisfies access control restrictions, it should
  // be accepted regardless of compilation mode.
  internal var x: Hidden.SomeType
}

Maybe I'm wrong an the language steering group does have some appetite for that sort of thing, but I think there are also alternative approaches that don't involve new mode dependent restrictions.

For example, I believe @Douglas_Gregor has sketched some approaches to making the code above compile without inherently exposing Hidden to clients. A serialized non-resilient module could opaquely encode all layout information needed by clients so that they don't need to be aware of the hidden dependencies in order to work with the public types of the library. The library could also include opaque entry points for operations that need to be abstracted because there is no way to avoid using symbols from the hidden dependency. This would effectively be a more limited form of resilience, deployed only where it is needed for abstraction.

1 Like

Yeah pretty much.

I like the idea of resilience-if-needed on these types and would like some way to warn if it's being used but this could be done as a linter instead.

I totally agree that diverging the language based on diagnostic flags is less than desirable but that's what happens today with -enable-library-evolution, it is a stricter sub-set of features for a specific purpose. That said, the way this implementation was done in the type-checker is, yes, restrictive, but that it was done is behind a flag and could be made more permissive with this resilience-if-needed could allow the language features allowed under those resilience "modes" to converge.

This is my goal anyway so I could definitlely run with those ideas, however concrete they are. @Douglas_Gregor is there a vision doc or partial implmentation to share or should we just email on the side?

I have a design sketch but haven't done anything regarding an implementation. Here's what I was thinking...

SE-0409: Hiding implementation details from the compiler

As noted in SE-0409, one of the challenges of internal imports is that the compiler currently depends on transitively loading any internal imports of a loaded non-resilient module to generate code. For example, consider these three modules:

 // module Utility
 public struct X {
   var x, y: Int
   
   public init() { ... }
 }
 
 // module Library
 internal import Utility
 public struct Y {
   var x: X
 
   public init() { ... }
 }
 
 // module Client
 import Library
 
 func dup(_ y: Y, count: Int) -> [Y] {
   Array(repeating: y, count: count)
 }

Within module Client, the compiler needs to have information about the layout of Y to be able to allocate space for it, perform copies, destroy instances, and so on. For non-frozen types in a resilient module, this information is evaluated at runtime using the value witness table. For frozen types and those in a non-resilient module (the common case), the compiler depends on having information about all of the instance properties of the type to generate this code. This requires the compiler to “see” the property Library.x (even though it is not visible to the user in module Client) and through the internal import Utility to determine the storage of X.

Hiding implementation details

The difficulty with internal imports comes from the need to hide the implementation details of transitive dependencies from the compiler. To that end, we propose to include abstract layout information for all used types as part of a compiled Swift module file. Abstract layout information provides sufficient information to reason about and manipulate the storage of a type even when it is impossible to reason about the contents of the type directly.

For presentation purposes, we’ll express abstract layout information as annotated Swift code, although it could be any representation produced and consumed by the compiler. For each public type within a library, we can annotate it with abstract layout information that shows all of its structure. Let’s consider a module that depends on both a C type and a type from an internally-imported Swift module

// Utility.h
typedef struct {
  int x, y;
} X;

// Module A
public struct Y {
  var name: String
}

// Module B
// -internal-import-bridging-header Utility.h
internal import A

public struct Z {
  private var x: X
  var y: Y
  public var weight: Double
}

The abstract layout information for Z would be as follows:

@layout(size: 32, alignment: 8, stride: 32, bitwiseCopyable: false)
public struct Z {
  @layout(offset: 0)
  private var x: @_hiddenType("$s3__C1XV")
  
  @layout(offset: 8)
  var y: @_hiddenType("$s1B1YV")
  
  @layout(offset: 24)
  public var weight: Double
}

The @layout attribute describes known layout information for the various types and fields. For types, it provides the size, alignment, and stride that are needed to correctly allocate storage for an instance of the type. It can also include other characteristics that can affect code generation, such as whether the type is bitwise-copyable, and we could expand this set of information over time.

Note that private and internal fields are represented in the abstract layout that is exposed to clients. However, the types of these fields are abstracted away via the @_hiddenType attribute, because clients cannot necessarily resolve the actual types. @_hiddenType uses a string representation (here, a mangled name) to provide that level of indirection while still maintaining a notion of type identity.

Hidden types

Hidden types that are needed to describe the representation of public types are also emitted, transitively. These use the same abstracted structure, but are identified by their string representation:

@layout(size: 4, alignment: 4, stride: 4, bitwiseCopyable: true)
struct $s3__C1XV {
  @layout(offset: 0)
  var x: CInt
  
  @layout(offset: 4)
  var y: CInt
}

@layout(size: 16, alignment: 8, stride: 16, bitwiseCopyable: false)
struct $s1B1YV {
  @layout(offset: 0)
  var name: String
}

Note that these representations are emitted as part of module B: they restate knowledge that B has about its internal dependencies, but in a manner that is abstracted so its clients can reason about the layout without loading B’s internal dependencies directly.

Dynamic layout and resilient types

The examples above use types that are hidden, but whose underlying storage is still fixed at compile time and known to the compiler. When a type uses a resilient type as part of its storage, the underlying storage is no longer of fixed size. As such, many aspects of its layout are dynamic. For example, Foundation.URL is a resilient type on platforms using ABI stability. Consider a variant of the prior example that embeds a URL :

// Module B2
// -internal-import-bridging-header Utility.h
internal import Foundation

public struct Z2 {
  private var x: X
  var y: URL
  public var weight: Double
}

Now, most of the aspects of the layout of Z2 are unknown at compile time. For exposition purposes, we represent this in @layout with the dynamic keyword:

@layout(size: dynamic, alignment: 8, stride: dynamic, bitwiseCopyable: dynamic)
public struct Z {
  @layout(offset: 0)
  private var x: @_hiddenType("$s3__C1XV")
  
  @layout(offset: dynamic)
  var y: @_hiddenType("$s10Foundation3URLV")
  
  @layout(offset: dynamic)
  public var weight: Double
}

The dynamic layout can also be used for members of generic types where the layout itself depends on the instantiated types. The compiler itself may be able to reason about the layout of a particular specialization (e.g., Pair<Int, String>) if it knows the layouts of the argument types.

For URL itself, we’ll have an empty definition whose layout is specified as “opaque”:

@layout(opaque, size: dynamic, alignment: dynamic, stride: dynamic)
struct $s10Foundation3URLV { }

For resilient types, the layout cannot statically be known to clients, because it could change without recompiling clients. The opaque designator specifies that the compiler should generate code that uses the value witness table to allocate and manipulate instances of the type. This is already the case with uses of resilient types within clients: the purpose of the opaque layout is to indicate which types need this treatment.

Note that the use of mangled name as the string identifying the layout type is now significant, because it provides the basis for emitting calls to retrieve the value witness table. For example, given the name $s10Foundation3URLV, we can form a call to the type metadata accessor $s10Foundation3URLVMa, which then provides the value witness table, and we can do this without ever resolving the type Foundation.URL in the client.

We could use a different string to identify the type, and embed the mangled name in the opaque layout description, e.g.,:

@layout(opaque("$s10Foundation3URLV"), size: dynamic, alignment: dynamic, stride: dynamic)
struct some_mangled_name_for_Foundation_URL { }

Opaque non-resilient types

In some cases, it can be valuable to get the compiler to treat types as being opaque even when they are non-resilient. This can be important when the type has some aspects that are known at compile time (e.g., size and alignment, offsets of various fields) but have semantics that the client cannot replicate without deep knowledge of the types involved. For example, this can occur with uses of C++ types that have non-trivial special member functions:

// MyCppLibrary.hpp
#include <string>
class MyString {
  std::string stored;
};

// module CppAdapter
// -internal-import-bridging-header MyCppLibrary.hpp

public struct W {
  var name: MyString
  var weight: Double
}

The C++ MyString type has nontrivial copy constructor, move constructor, destructor, and so on. A client of the CppAdapter module therefore cannot copy, move, or destroy an instance of W without being able to generate calls into those C++ special member functions to operate on the name field. And calling those C++ functions requires the client to both be building with C++ interoperability enabled and to have imported MyCppLibrary.hpp itself, which doesn’t work.

Instead, we emit an opaque layout for MyString along with supporting functions inside CppAdapter. The abstract layouts look like this:

@layout(size: 32, alignment: 8, stride: 32, bitwiseCopyable: false)
public struct W {
  @layout(offset: 0)
  var name: @_hiddenType("$s5__Cxx8MyString")
  
  @layout(offset: 24)
  var weight: Double
}

@layout(opaque("$s10CppAdapter$s5__Cxx8MyString"), size: 24, alignment: 8 stride: 24, bitwiseCopyable: false)
struct $s5__Cxx8MyString { }

Note that the abstract layout of the struct W is fully-determined, because the size of MyString and Double are both known at compile time. Indeed, the abstract layout for MyString specifies size, alignment, and stride because its layout is known at compile time as well.

However, MyString is marked opaque because only the CppAdapter module we are in is guaranteed to be built with C++ interoperability and able to form calls to its special member functions. When compiling CppAdapter, the Swift compiler will need to emit metadata for MyString that is similar to what would be emitted for a resilient type, e.g., a value witness table or type metadata accessor to retrieve that value witness table. These symbols need to be publicly accessible, because a client of CppAdapter that needs to emit a copy of a W instance will have to reference them as part of copying the name field.

Note that many modules in a program may have an include of MyCppLibrary.h and a use of MyString that produces an abstract layout. To prevent symbol collisions when those modules are part of the same program, the opaque designator provides an alternative mangled name that includes both the name of MyString and the name of the module that is using it. Therefore, the set of emitted symbols for accessing the value witness table will be unique across the program. There are future optimizations here where some central module could opt to provide the value witness table or accessors explicitly, so that clients need not emit redundant copies.

Opaque layouts can also be used to “cut off” recursion into the structure of internally imported types. For example, we could chose to use an opaque layout for every type that is imported from an internal import, which reduces the number of abstract layouts we need to emit, but can degrade the quality of the generated code because the compile needs to go through the value witness table for opaque types. However, the information behind opaque types can be recovered in some cases, as will be discussed in the next section.

Recovering hidden type information

One of the challenges highlighted earlier in the document is that clients may have to emit larger or less-efficient code to copy values when the details of that value’s type have been hidden. For example, the copy of the type W in a client needs to call through the opaque value witness table for MyString rather than directly calling the C++ copy constructor (which could have been inlined).

The abstract layout mechanism allows clients to recover the “hidden” type information when they have access to the type that has been hidden. For example, let’s reconsider one of our earlier examples containing three Swift modules:

 // module Utility
 public struct X {
   var name: String
 }
 
 // module Library
 internal import Utility
 public struct Y {
   var x: X
 }
 
 // module Client
 import Library
 
 func dup(_ y: Y, count: Int) -> [Y] {
   Array(repeating: y, count: count)
 }

The abstract layouts in Library could look like this:

@layout(size: 16, alignment: 8, stride: 16, bitwiseCopyable: false)
public struct Y {
  @layout(offset: 0)
  var x: @_hiddenType("$s7Utility1XV")
}

@layout(opaque, size: 16, alignment: 8, stride: 16, bitwiseCopyable: false)
struct $s7Utility1XV { }

Within Client, copying an instance of Library.Y means generating code to fetch and use the value witness table of Utility.X, which is less efficient (both in code size and run time) than if we knew the layout non-opaquely.

However, if the Client for some other reason were to import Utility, it has knowledge of the type Utility.X that it could use to emit better code. This is the other reason to use some form of mangled name for hidden types: $s7Utility1XV uniquely identifies the type Utility.X in a Swift program. If the Utility module is available, the Swift compiler can resolve the hidden type $s7Utility1XV to the actual type Utility.X and implement the more efficient code path. The same approach can be applied to imported C++ types when in a module that itself has enabled C++ interoperability, subject to the C++ One Definition Rule (ODR).

This optimization approach allows clients to mitigate the code size and runtime performance cost of hiding implementation details by explicitly introducing import dependencies on the modules that define these “hidden” types. The introduction of this import (or #include in a bridging header) might uncover modularization-related issues that require fixing to get the optimization back.

Doug

15 Likes

Since the appetite of the language steering group was mentioned here, we did indeed discuss this pitch and in particular conditionally rejecting code based on -enable-library-evolution or a similar flag.

Today, we have one place where -enable-library-evolution changes the semantics of the language in a very specific way, and it has caused an outsized amount of pain and confusion. Moving forward, we want to avoid introducing more such features.

As pitched, enabling library evolution mode (or similar) would make the language more permissive than without the mode, and we don't think that users should have to opt into such a mode in order to write reasonable code.

Therefore, since we do think there's at least one--hard, but implementable--alternative design which doesn't require creating a new language dialect, we would concur with feedback that the pitched design wouldn't be our preferred approach to the problem--although of course we do certainly appreciate all of the effort and motivation!

-- Xiaodi, on behalf of the language steering group

3 Likes

This to me seems like a decent trade-off of code size (not emitting recursive internal imports) vs complexity (going through value witness tables) but if this is the case, why emit sibling internal types as well and not just make any non-public interface opaque, a sort of "partial resilience"?

At the module boundary a public type with all public members would be fragile (known layout, size, etc) but with internal-access members would be opaque. Your first example would become:

@layout(opaque, size: dynamic, alignment: dyanmic, stride: dynamic, bitwiseCopyable: dynamic)
public struct Z {
  // internal members not emitted

  @layout(offset: dynamic)
  public var weight: Double
}

Is the idea that those internal-but-declared-in-the-same-module types would be the same or similar binary cost but are then effectively opaque to the importer (i believe you called them hidden types) and we don't need to go through the runtime cost of metadata access to resolve those types?

Or am i conflating module serialization and IRGen and what actually changes here is when and how Clients emit calls to library metadata accessors based on how complete the layouts are from a given library's module interface?

1 Like

Need this like water.

It sounds like the design as-pitched isn’t going to pass muster with the language steering group. It also sounds like from this reply that there’s an alternative design. Where is that alternative design? This particular issue bites us pretty hard, and we have some time to be able to work on it. I’d rather not go down a design path that’s doomed from the start.

Oh it’s this one. That’s super-confusing. Your reply looked like a reply to this proposal. I’ll be quiet now. :slight_smile:

Hi @Douglas_Gregor,

I work with Adam and I’ve been working on implementing something similar to what you describe for a while now, and I’d like to outline my plan and check that my approach seems reasonable.

We want to hide implementation details to improve incremental build speed, but so far I have tried not to regress the quality of the generated code in doing so. My goal has been to identify the minimal information we need to relay to clients of a module such that they can generate correct and efficient IR, but no more. For example, I don’t want the “abstract layout information” as you call it to include field names, and I don’t want to emit abstract layout information about internal types that aren’t actually leaking into the module’s ABI.

I have a draft PR here in which I am working on making _implementationOnly safe. Once that is complete, I plan to use the same mechanisms for the internal imports introduced in SE-0409 and drop the requirement that internally imported swiftmodules be provided to client modules.

What information should go in abstract layout information?

I’ve been reading the compiler source code to learn more about exactly what information is needed for the client to generate correct code for various kinds of types and this is what I’ve gathered. For a given type, starting from the typechecked AST node, we produce a type lowering to guide SILGen for operations manipulating the type, and a TypeInfo class to guide lowering that SIL to IR.

My approach so far has been going through TypeInfo subclasses and, if appropriate, introducing a hidden analogue for each and a serializable swiftmodule representation.

For example, for LoadableStructTypeInfo I have introduced HiddenLoadableStructTypeInfo and a serializable representation, for ReferenceTypeInfo, I have introduced HiddenReferenceTypeTypeInfo and again a serializable representation.

When producing the swiftmodule for a given module, for each _implementationOnly imported type whose ABI details are leaking, I fetch the TypeInfo for the type, fetch the hidden analogue and serialize that into the swift module.

I know there will be opportunities for consolidating what information really needs serializing, but for now I've found this to be the best starting point for systematically working towards the minimal information needed for correct codegen for each kind of hidden type.

Swift module representation and recovering hidden type information:

My high level idea for how to represent hidden type layout information in swiftmodule files is that hidden types be used as fall back representations when XREF resolution fails. When we serialize a regular “visible” struct, the structure looks roughly like this:

  1. STRUCT_DECL points to fields represented as VAR_DECL records

  2. VAR_DECL records point to the associated type via TYPE_ID

  3. TYPE_ID identifies something like a NOMINAL_TYPE record

  4. NOMINAL_TYPE records point to the declaration defining the type via DECL_ID

  5. That DECL_ID can refer to record defined in the same module, or it can refer to an XREF record

  6. XREF record is a path identifying the declaration to load from another module (ie. ModuleBar.TypeFoo)

If resolution of ModuleBar.TypeFoo fails because the client is unable to load ModuleBar, _implementationOnly can currently generate incorrect code. I have modified the XREF representation with a nullable “fallback local representation id” field. If the client fails to load ModuleBar, but a fallback local hidden representation is available, we use that instead.

In this way, recovery of hidden type information as you describe is achieved “for free”, by re-using the same cross module declaration reference system already used. We don’t have to use some new scheme where we look up hidden types by mangled name for recovery. If the client happens to import ModuleBar, the full representation of TypeFoo will be loaded automatically.

Hidden layout representations follow a similar structure and may refer to each other, as well as fully visible types. They do this in the same way, via TypeID -> … -> XREF. If we succeed in resolving the XREF, great, if not we use the fallback local representation.

Restrictions on use of hidden types by the client:

I want to call out that my design relies upon the client being restricted in the operations it can perform on a hidden type. In particular, the client should only be able to generate, initialize, copy, move and destroy operations on a hidden type. Generating code that accesses a particular field of a hidden type for example should be forbidden.

Practically what this means is I want to ban the use of hidden types in @inlinable functions. This already seems to be the case for _implementationOnly imported types, as well as for types from internal imports by default. The proposed but I believe not yet implemented @usableFromInline aims to change that. I propose either we don’t implement @usableForInline or if need be, require clients load the entire internal module if usableFromInline is in play.

Use cases for recovering hidden type information:

So far the hidden representations I’ve introduced don’t produce worse, more indirect code, and so don’t benefit from hidden type recovery. However, I plan to produce more indirect VWT table style code at least for imported C++ types.

I still have a lot of work to do and many edge cases to test, but I wanted to check in and make sure the approach is sound before I spend too much more time. Happy to take any feedback.

-- Nuri

4 Likes