[Pitch] Fixing member import visibility

tshortli · April 24, 2024, 12:05am

Hi folks,

I've been working on a proposal that aims to address an annoying inconsistency in the behavior of name lookup in Swift and I'd like to get your initial feedback.

Member import visibility

Bug: apple/swift#46493
Implementation: apple/swift#72974, apple/swift#73063
Experimental Feature Flag: MemberImportVisibility

Introduction

In Swift, there are rules dictating whether the name of a declaration in another module is considered in scope. For example, if you have a program that uses the swift-algorithms package and you want to use the global function chain() then you must write import Algorithms in the file that references that function or the compiler will consider it out of scope:

// Missing 'import Algorithms'
let chained = chain([1], [2]) // error: Cannot find 'chain' in scope

The visibility rules for a member declaration, such as a method declared inside of a struct, are different though. When resolving a name to a member declaration, the member is in scope even if the module introducing the member is only transitively imported. A transitively imported module could be imported directly in another source file, or it could be a dependency of some direct dependency of your program. This inconsistency may be best understood as a subtle bug rather than an intentional design decision, and in a lot of Swift code it goes unnoticed. However, the import rules for members become more surprising when you consider the members of extensions, since an extension and its nominal type can be declared in different modules.

This proposal unifies the behavior of name lookup by changing the rules to consistently require direct module imports in order to bring both top-level declarations and members into scope.

Motivation

Suppose you have a program depends on a library named RecipeKit. The library interface looks like this:

// RecipeKit interface

public struct Recipe { /*...*/ }

extension String {
  /// Returns the recipe represented by this string.
  public func parse() -> Recipe?
}

To start, your program contains a single source file main.swift that imports RecipeKit:

// main.swift
import RecipeKit

let recipe = "2 slices of bread, 1.5 tbs peanut butter".parse()

Later, you decide to integrate with a new library named GroceryKit which happens to also declares its own parse() method in an extension on String:

// GroceryKit interface
public struct GroceryList { /*...*/ }

extension String {
  /// Returns the grocery list represented by this string.
  public func parse() -> GroceryList?
}

You add a second file that imports GroceryKit:

// Groceries.swift
import GroceryKit

var groceries = GroceryList()
// ...

Surprisingly, now that GroceryKit is a transitive dependency of main.swift, there's a new compilation error:

// main.swift
import RecipeKit

let recipe = "2 slices of bread, 1.5 tbs peanut butter".parse()
// error: Ambiguous use of 'parse()'

Before the new file was added, parse() could only refer to the extension member from RecipeKit. Now that it might also reference the extension member in GroceryKit the compiler considers the use of parse() to be ambiguous. To resolve the ambiguity, the developer must add a type annotation to the declaration of the variable recipe to give the compiler the additional context it needs to disambiguate:

let recipe: Recipe = "2 slices of bread, 1.5 tbs peanut butter".parse() // OK

This example demonstrates why "leaky" member visibility is undesirable. Although the fix for the new error is relatively simple in this code, providing disambiguation context to the compiler is not always so straightforward. Additionally, the fact that some declarations from GroceryKit are now visible in main.swift contradicts developer expectations, since visibility rules for top level declarations do not behave this way. This idiosyncrasy in Swift's import visibility rules harms local reasoning and results in confusing errors.

Proposed solution

In a future language version, or whenever the MemberImportVisibility feature is enabled, the Swift compiler should only consider members from directly imported modules to be in scope.

Detailed design

Excluding references to members from transitively imported modules in the local scope will prevent surprising ambiguities like the one detailed earlier. However, this change of behavior will also break some existing code that used to be accepted. The Swift compiler will now emit an error for member references that used to successfully resolve to a declaration in a transitively imported module:

// In this example, RecipeKit is imported in another file

// note: add import of module 'RecipeKit'

let recipe = "1 scoop ice cream, 1 tbs chocolate syrup".parse()
// error: instance method 'parse()' is inaccessible due to missing import of defining module 'RecipeKit'

Rather than simply indicating that the member's name is not in-scope, though, the compiler can identify the declaration you likely meant to use and suggest importing the module that defines it. An IDE may also offer a fix-it to add the missing module import to the file.

Source compatibility

The suggested change in behavior is source breaking because it adds stricter requirements to name lookup. There is much existing Swift code that will need to be updated to adhere to these new requirements, either by introducing additional import statements in some source files or by reorganizing code among files. This change in behavior therefore must be opt-in, which is why it should be limited to a future language mode with an upcoming feature identifier that allows opt-in with previous language modes.

ABI compatibility

This change does not affect ABI.

Implications on adoption

Adopting this feature ought to be straightforward because the compiler can identify module imports that are missing under the new rules and guide the developer to add an explicit import to resolve the error. Furthermore, although enabling this feature would be source breaking for many packages, it's important to note that source code that conforms to the new import requirements will be backwards compatible with older compilers and language modes that don't enforce this requirement. This feature does not require any new syntax or introduce rules that contradict rules of previous language modes.

Future directions

Add module qualification syntax for extension members

This proposal seeks to give developers explicit control over which members are visible in a source file because this control can be used to prevent and resolve ambiguities that arise when different modules declare conflicting members in extensions. With this proposal implemented, if an extension member ambiguity still arises in a source file then the developer has the option of curating the imports in that file to resolve the ambiguity. This may work in some situations, but in others it may be awkward to refactor code in order to avoid importing a module that introduces a conflict. For these cases it would be useful to have a syntax that unambiguously identifies the desired extension member at the use site. For example, here's a hypothetical syntax for explicitly calling the parse() method declared in the module RecipeKit:

let recipe = "...".RecipeKit::parse()

Alternatives considered

Introduce module qualification syntax for extension members instead

One alternative approach to the problem would be to rely exclusively on a new syntax for disambiguation of extension members (as discussed in Future directions). The limitation of that approach is that alone it only empowers the developer to solve conflicts reactively. On the other hand, the solution provided by this proposal is preventative because it stops unnecessary conflicts from arising in the first place. In the fullness of time, it would be best for both solutions to be available simultaneously.

jrose · April 24, 2024, 1:16am

“directly imported” really means “the transitive closure of directly imported modules and everything they re-export, plus bridging header / framework module imports”, but otherwise this seems good, bringing members more in line with top-level names. I’m glad to hear that the compiler will help with diagnostics when an import is missing; may I suggest that that diagnostic also kick in when an overload doesn’t match? Otherwise it’s too easy for the missing import to get missed when there’s something else “in the way”.

Jon_Shier · April 24, 2024, 2:50am

I've never understood how this worked. Most of the time, in my experience at least, dependencies of dependencies still require explicit import. When would they get automatically imported that wasn't @_exported from the initial dependency?

xwu · April 24, 2024, 4:31am

Well...not source code that takes advantage of the new rules in order to resolve ambiguities that would be present for older compilers, such as given in the motivating example. Which should be understood by this point in the text but, read in isolation, the backward compatibility story here is not as sanguine as presented.

Can you comment on whether these changes will include/affect any other import visibility warts, such as operators?

gwendal.roue · April 24, 2024, 6:29am

I think that this is an example of implicit imports:

// File1.swift
import Lib

func makeLibValue() -> Lib.Value { ... }

// File2.swift
// Compiles despite the lack of `import Lib`
makeLibValue().frobnicate()

Another slightly related example that compiles today:

// File1.swift
import Lib

// Allow other files to use `Lib.Value` without importing Lib
typealias LocalValue = Lib.Value
func useLibValue(value: LocalValue) { }

// File2.swift
// Compiles despite the lack of `import Lib`
func useLibValueTwice(value: LocalValue) {
  useLibValue(value)
  useLibValue(value)
}

I do use the second technique today, when I want to hide an implementation detail. It's handy to define a typealias. If I had to define a wrapper struct, for clients to not need to perform an extra import, I would be slightly disappointed.

Jon_Shier · April 24, 2024, 2:19pm

Right, but my question was how the first example worked in the first place. Has it always worked like that?

allevato · April 24, 2024, 2:32pm

(Excluding things like extensions and operators for the moment, since they're known to be "weird"...) Swift only requires that you import a symbol to write its name in your source file. It's possible to use a type's members without spelling it directly in your source code because Swift doesn't do any dependency pruning; in order to import a module, all of its transitive dependencies must also be loaded.^[1]

The following is totally fine:

// Module1
public struct MyType {
  public func myMethod() {}
}

// Module2
import Module1
public func makeMyType() -> MyType {}

// Module3
import Module2
let x = makeMyType()
x.myMethod()

Today, myMethod can be called from Module3 without importing Module1 directly because the compiler must load Module1 as part of loading Module2, and the type checker has all the information it needs about what type makeMyType() actually returns because it's serialized in Module2. But since you're not writing MyType directly in the source of Module3, you don't need to import it. If Module1 is already a transitive dependency somewhere, then the only difference you get from importing it in your code is that its top-level names become available in that file.

This also means there's a subtle difference depending on whether you use an explicit type annotation or type inference. The following would not work unless you imported Module1, because now you're spelling the name:

let x: MyType = makeMyType()

Again, I might be glossing over some edge cases, but AFAIK this is generally how Swift behaves today.

...excluding unsupported features like @_implementationOnly import, which does allow for dependencies to be pruned. But even in that case, we're talking about public APIs, and you would not be able to prune a dependency used by a public API. ↩︎

tshortli · April 24, 2024, 3:29pm

Yes, I noticed that problem in a few spots while testing this with existing code. This problem already exists with top level declarations that get "in the way," as you say. Trying harder to find the exact match and providing a more helpful diagnostic seems valuable for both top level and member declarations, though.

tshortli · April 24, 2024, 3:48pm

Sure, I can be more explicit about this. It does obfuscate the fact that older tools would require disambiguation, but I don't think that counts as creating a new backward compatibility problem.

Do you have any examples or links to issues handy? I'm not familiar with the problems so I'll need to do some research. I did write some tests for operators declared as static members of an extension, but those tests appeared to demonstrate that operator lookup behaved the same as lookup for top level declarations. With and without this feature enabled, the operators were only considered in scope if the module defining them was imported. I wouldn't be surprised if there were more interesting operator test cases to consider, though.

allevato · April 24, 2024, 3:51pm

Now that I've gone back and digested the write-up and looked at some of the toolchain tests a bit more, I'm fond of this idea in general—leaky extensions are a big problem. My team also works on static analysis tooling, one of which is an "import what you spell" tool for Swift, and we've had to come up with interesting workarounds for many situations like the ones you've highlighted in the pitch.

Request for a clarification: Most of the tests I saw deal with extensions, enum cases inferred via type inference from something declared in another module, and so forth. But can you clarify whether the fairly simple example I used above would also be affected?

// Module1
public struct MyType {
  public func myMethod() {}
}

// Module2
import Module1
public func makeMyType() -> MyType {}

// Module3
import Module2
let x = makeMyType()
x.myMethod()          // <<< HERE

Would enabling MemberImportVisibility cause the reference to myMethod() to fail to resolve unless Module3 imported Module1? The name of the flag makes me think "yes", but I'm still unsure. Requiring the import would be quite a departure from Swift today, but I'm typically a fan of just about anything that requires users to be more aware of the things they're actually depending on.

tshortli · April 24, 2024, 3:58pm

Yes, with MemberImportVisibility enabled the compiler would require an import of Module1 to be added to the source file representing Module3. I'm realizing this is probably one of the more controversial aspects of the proposal and should be called out explicitly. I think it's worth it, but I can see how some may find the requirement pedantic in this example after relying on the existing behavior for years.

allevato · April 24, 2024, 4:11pm

Thanks for confirming this!

I don't want to get too far into the weeds here, but if clients have to be more explicit about what they're using, can this lead us down a path where we can more aggressively trim dependencies, even for public dependencies? Large build graphs could benefit greatly if they only had to pass the modules that were explicitly included to the compiler, instead of the entire transitive closure.

I guess one thing that I'm not sure about is whether codegen would need to know MyType's layout to compile the following, where Module3 does nothing with MyType from Module1 except pass it around as an opaque value:

import Module2
let x = makeMyType()  // function declared in Module2
takeMyType(x)         // function also declared in Module2

But we've discussed before the need to do some kind of layout abstraction in order to trim private dependencies from non-resilient modules, so maybe that's relevant here as well?

tshortli · April 24, 2024, 4:37pm

allevato:

I guess one thing that I'm not sure about is whether codegen would need to know MyType's layout to compile the following, where Module3 does nothing with MyType from Module1 except pass it around as an opaque value:
import Module2
let x = makeMyType()  // function declared in Module2
takeMyType(x)         // function also declared in Module2

We do need type information for the type of x in order to compile this code. The proposal would not require the module defining the type to be imported in this example, and I don't think it should. The way I see it is that import statements are mainly a tool for bringing names into scope to influence name look up and codegen is necessarily a separate concern. I'm also interested in ways that we can exercise control over dependencies for complex build graphs, but I think that solutions to that problem ought to operate on modules as the fundamental unit in the graph, rather than source files.

taylorswift · April 24, 2024, 4:49pm

i think as a practical matter this will cause problems in DSLs and other code that uses a lot of nested closures because when members disappear the entire block fails to typecheck and it takes a really long time and a lot of effort to figure out what went wrong.

we know this because today operators work essentially the way this proposal envisions that named methods should work and when you move operator definitions between modules and don’t remember to update the imports the result builders and DSL closures that use them stop compiling.

obviously even without this proposal it is still a problem because if you remove the last import in the build tree you get the same could not type check in a reasonable time problem. but this would cause you to run into that problem a lot more frequently which does not make me enthusiastic about this change. in a way the compiler’s lax approach towards member visibility mitigates a lot of headaches that would otherwise result from using DSLs in Swift today.

jrose · April 24, 2024, 4:58pm

Just to be clear, operators do not work like this either. Both operator decls and operator functions have (distinct) unholy monstrosities in their lookup implementations that are not the same as either top-level names or members, or at least they were five years ago. There may be some analogies but they should probably be kept concrete because of the major implementation differences.

xwu · April 24, 2024, 6:35pm

It's been a while since I've thought about this topic, so admittedly my thinking about it is a bit jumbled. I was probably conflating @jrose's points #2 and #3 from way back when:

We've addressed #1, of course, with visibility modifiers on import statements. It's fair to fix #2 alone without tackling #3, and probably good enough for these purposes that operators implemented as static members work consistently as you've verified. (That said, if ensuring that behavior involves tackling #3 more holistically, would be nice to get it all done.)

jrose · April 24, 2024, 6:37pm

I was being very explicit saying “operator declarations” then and now—that’s declarations with the operator keyword. Operator functions are also bad because they have to do top-level lookup and member lookup, in multiple places. So there are four things, really.

pyrtsa · April 24, 2024, 7:54pm

I like the general direction of this idea, but I don't think it alone is sustainable in practice unless the compiler is also taught to warn about unused (private) imports in files.

Without that, source code will be full of imports (as is already possible) no one's sure if they're in use, and the only way to tell involves a series of unproductive whack-a-mole exercises with the compiler.

rockbruno · May 22, 2024, 8:45am

I plus one on the concept of teaching the compiler about unused imports. It seems most people who bump into this issue do so because they are looking for a way to accurately detect unused dependencies in Swift modules.

Douglas_Gregor · May 30, 2024, 7:37pm

I'd love to see this warning, but I don't think it belongs in an evolution proposal. This is something that the compiler (or a sufficiently-smart linter) could just go ahead and implement whenever.

Doug