[Discussion] `#if hasSymbol(Module.Symbol)`, a compile time check for a module symbol's availability

Jumhyn · June 21, 2022, 12:57am

Beyond fundamental layering problems, there's another concern I have about the compile-time version of this feature, related to the issues that have been discussed around disambiguation, and also somewhat touched on by this comment:

I think what's been bugging me is that in the failure condition, i.e., the case where the symbol in question is not present, it's really difficult to talk about symbol identity.

Suppose someone is writing a program and compiling against platforms X and Y, which have (presently) divergent APIs. Platform X supports doSomething:

struct Foo {
  func doSomething(with bar: Bar)
}

whereas platform X does not:

struct Foo {
  // nothing
}

so, the author uses the #if hasSymbol feature:

#if hasSymbol(Foo.doSomething(with:)
Foo().doSomething(with: .init())
#endif

next year, after hard work on platform Y, the developers release a new version of the library:

struct Foo {
  // Note: do *not* use 'doSomething' with a default-initialized 'UnsafeBar'
  // This is a serious programming error
  func doSomething(with unsafeBar: UnsafeBar)
}

The author of our original program has now inadvertently introduced a use of an unsafe API, and an incorrect use at that!

Ostensibly, when they wrote #if hasSymbol(Foo.doSomething(with:), they were in some sense only wanting to take the #if branch if platform Y introduced the 'same' API as platform X, and not a different one. Perhaps this has a clear meaning in the common cases (like Apple's platforms), and this issue wouldn't really arise in practice.

But in the edge cases, I don't know how to talk about whether two different symbols, on platforms which may have almost entirely divergent API surfaces, are 'really' the same underlying symbol. Should we require type-based disambiguation for all symbols passed as an argument to hasSymbol, even those that are not today ambiguous? Is type-name-based identity even the right tool for specifying a specific symbol which doesn't currently exist at all on the platform being compiled against?

I know you have mentioned that you want to keep the runtime version of this feature to a separate thread, so I won't go into my thoughts on that part of the feature too extensively, except to say that the problem seems much more tractable when we are able to resolve specific symbol identity at the time that the check is first written.

tshortli · June 21, 2022, 4:56pm

Hm, yeah I suppose the unique difficulty is that the ambiguity in the situation you have described is undetectable by the compiler in the first place. If two doSomething(with:) APIs are present in the source files but only ever active under mutually exclusive compilation conditions then the compiler simply has no way to help either the library author or the client. We've so far focused on matching API functions in the examples but I think there may be additional challenges with other types of declarations, too. For example, if you wanted to match on a class Foo existing (perhaps to use Foo.self) there'd potentially be a similar ambiguity with enum Foo or struct Foo being defined under different conditions right? Use Foo.self as an argument to a function that takes Any and you've again got a potentially dangerous ambiguity.

Perhaps we need a predicate syntax that is something more like this (pretend that we have the fully qualified module name syntax that uses ::):

#if hasClass(FooKit::Foo)

#if hasClassMethod(FooKit::Foo.bar(_:) as (Swift::Int) -> Swift::String)

#if hasEnumCase(FooKit::Result.success)

Edit: I updated the hasClassMethod example above to qualify Int and String as types from the Swift module and then realized that even that might not be enough - do we need to qualify the kind of declaration those types have in the coercion as well?

0x41c · June 21, 2022, 5:39pm

I'm not liking the casting in the compiler directive especially implementation-wise. By introducing it there we are stuck in the chicken and egg situation. But I like how you added fully-qualified name syntax into here and the specific directives hasClass, hasClassMethod etc. I believe though that by wrapping the symbol, it would help disambiguate outside of the directive itself for symbol types. (You'll see more examples of this below)

#if hasSymbol(classMethod(FooKit::Foo.bar(_:)))

The one thing I'm beginning to realize is that, yes a compiler directive #if hasSymbol would be powerful, but I don't think it would make sense to have this as the main directive seeing all the issues that could come up.

It only would work in restricted circumstances and would require possibly other features that are not available during that point in the compilation. For example, it would be a lot easier if there was a way to define preprocessor conditions in source. That way, this could be cleared up:

#define barType (struct(Swift::Int)) -> struct(Swift::String)
#if hasSymbol(classMethod(FooKit::Foo.bar(_:) as barType))

But alas, we don't, and that would be another pitch in itself. Because of that, we're left with this monster:

#if hasSymbol(classMethod(FooKit::Foo.bar(_:) as (struct(Swift::Int)) -> struct(Swift::String)))

I mean, yes, it provides all the information necessary to properly identify a symbol without the need for magic disambiguation on behalf of an almost non-existent type inference model, but it's not ergonomic and introduces a lot of design changes in how if-configs are parsed.

With all that said, I don't see another way for this to be implemented. These changes frankly aren't the worst. People using this would already have the knowledge of what they need to be done and won't have a problem supplying the type information for it. There definitely is a need for this feature, but it would be better implemented in runtime, forgoing the optimizations that could be applied by having it as a compiler directive.

EDIT: Shouldn't Int and String follow the fully-qualified name syntax and be extended in reference from their modules through ::? I edited my examples to show this.

tshortli · June 21, 2022, 6:13pm

Yeah, I forgot to make those consistent, I'll update the post.

tshortli · June 21, 2022, 6:26pm

And now that post is gone, flagged by the spam bot

0x41c · June 21, 2022, 7:31pm

Once again, that is seriously unfortunate since there were replies. I'm not sure how I feel about it taking 12 hours to restore either.

EDIT: I see it took less time than it did for me which is good!

tshortli · June 21, 2022, 8:52pm

Let me know if I'm misinterpreting what you're saying here, but I want to clarify that the problem I want to solve cannot be solved fully solved with runtime checks unless we pursue a very different approach to the problem. In the case that motivated me to think about this design, a build time query in the preprocessor specifically allows a source file to reference some recently introduced declarations but be structured in such a way that it still compiles against an SDK that does not yet have those declarations. Code referencing this potentially missing declaration will of course not typecheck without the module's declaration, and that's why we've been looking for a preprocessor based solution.

However, the discussion we've been having is leading me to revisit another approach that I had initially discarded but now looks more appealing given how problematic the preprocessor approach may be. We could make it possible to "forward declare" declarations, like in C-like languages:

// Locally declare the FooKit API that may be missing locally
@forwardDeclaration
extension FooKit::Foo {
  func bar(_ x: Int) -> String
}

// Use the API, which is known to be potentially unavailable due to `@forwardDeclaration`
if #hasSymbol(FooKit::Foo.bar(_:)) { ... }

(This is not meant to be the real syntax; just a sketch for demonstration purposes).

The advantage of this approach is that it leverages existing Swift parsing and typechecking to give the type checker the information it needs to type check code using bar(_:) whether or not it is declared in the original module. Additionally, it ought to be possible for the compiler to diagnose ambiguities when they do arise, solving one of the problems we were having without quite as much unnatural verbosity. I had originally dismissed this idea because it seems like a bigger addition to the language, but after talking through the difficulties of the preprocessor approach I'm not sure it is.

It might make sense to have some additional syntax that scopes a forward declaration to a specific module for clarity:

forwardDeclare FooKit {
  extension Foo { ... }
}

You could even imagine taking this farther and allowing polyfills of missing APIs so that it is always possible to call the function but your implementation is substituted whenever the API is not defined in the SDK or unavailable at runtime. However, that idea is really only practical for functions so it's probably not a general enough solution.

xwu · June 22, 2022, 1:25am

tshortli:

We could make it possible to "forward declare" declarations, like in C-like languages:
// Locally declare the FooKit API that may be missing locally
@forwardDeclaration
extension FooKit::Foo {
  func bar(_ x: Int) -> String
}

// Use the API, which is known to be potentially unavailable due to `@forwardDeclaration`
if #hasSymbol(FooKit::Foo.bar(_:)) { ... }
(This is not meant to be the real syntax; just a sketch for demonstration purposes).

Reminiscent of a feature we already have for internal Swift use only (albeit discouraged):

github.com

apple/swift/blob/main/docs/StandardLibraryProgrammersManual.md#using-_silgen_name-to-call-c-from-swift

# Standard Library Programmers Manual

This is meant to be a guide to people working on the standard library. It covers coding standards, code organization, best practices, internal annotations, and provides a guide to standard library internals. This document is inspired by LLVM's excellent [programmer's manual](http://llvm.org/docs/ProgrammersManual.html) and [coding standards](http://llvm.org/docs/CodingStandards.html).

TODO: Should this subsume or link to [StdlibRationales.rst](https://github.com/apple/swift/blob/main/docs/StdlibRationales.rst)?

TODO: Should this subsume or link to [AccessControlInStdlib.rst](https://github.com/apple/swift/blob/main/docs/AccessControlInStdlib.rst)

In this document, "stdlib" refers to the core standard library (`stdlib/public/core`), our Swift overlays for system frameworks (`stdlib/public/Darwin/*`, `stdlib/public/Windows/*`, etc.), as well as the auxiliary and prototype libraries under `stdlib/private`.

## Coding style

### Formatting Conventions

The Standard Library codebase has some uniformly applied formatting conventions. While these aren't currently automatically enforced, we still expect these conventions to be followed in every PR, including draft PRs. (PRs are first and foremost intended to be read/reviewed by people, and it's crucial that trivial formatting issues don't get in the way of understanding proposed changes.)

Some of this code is very subtle, and its presentation matters greatly. Effort spent on getting formatting _just right_ is time very well spent: new code we add is going to be repeatedly read and re-read by many people, and it's important that code is presented in a way that helps understanding it.

#### Line Breaking

This file has been truncated. show original

tshortli · June 30, 2022, 9:21pm

Just wanted to give you all an update about where I've ended up landing in my investigation. We already have an underscored version of canImport() which solves the SDK evolution problem:

#if canImport(FooKit, _version: 42.1)
  // Use APIs that were introduced at or before version 42.1 of FooKit
#endif

Module owners can specify the version of their module by supplying the -user-module-version flag to the frontend. We may want to make this feature official by bringing it through evolution; I'm curious if folks think it is widely applicable enough to do so.

So while my problem is solved by this, I realize it does not solve the problem that motivated this thread. Decls that are platform/SDK specific might be omitted from the distributed module in one SDK while being present in another but the module version number could easily be the same in both SDKs.

The canImport(Module, _version: ...) syntax does give me another idea that's an iteration on one I suggested earlier in the thread, though. Suppose module owners were able to specify a "variant" (better name to be bikeshedded) for a build of a module with some kind of string identifier. You could imagine allowing clients to use a syntax like this then:

#if canImport(FooKit, variant: Linux)
  // Use APIs that are only present in the "Linux" builds of the module
#endif

In this simplest version of this proposal, the variant is an arbitrary string and there is just one variant for a given built module and the possible values are mutually exclusive. The module owner would need to document when the presence of an API depends on variant. Conceptually these variants are similar to #if os(...) but it's up to specific library to define what a variant means for them.

I can see potential problems with this design as libraries evolve over time and need to introduce new variants, but it feels like there might be something there. The big advantage I see is that it's both conceptually simple and straightforward to implement and seems like a useful building block.

A slightly more complex version of this would be to allow multiple of these strings to be supplied by the module owner (maybe we would call them "capabilities" in this model). Then you could have a capability per API or set of related APIs that is either present or not present. This is essentially the same idea as exporting -D compile time conditions from a module but the syntax makes it clear that these capabilities are scoped to a module. This is more flexible and is probably more friendly to library evolution than the variant idea.

With both ideas, you could even print the decls that are protected by a certain variant/capability in the .swiftinterface with a guard surrounding it so that all the APIs and which capabilities they require are fully documented by the interface.

ksluder · June 30, 2022, 9:57pm

As far as I am aware, Swift also lacks a way to detect if a weakly-imported C symbol is null at runtime. I would like to think that this deserves a true symbol-specific check, the syntax for which would also work for Swift symbols. For example:

if @hasSymbol(WeakCSymbol) {
  // runtime check for weakly imported symbol
}

#if hasSymbol(SomePlatformSpecificSymbol) {
  // compile time check for conditionally available symbol
}

tshortli · June 30, 2022, 10:02pm

Yes, I'm separately looking into the design of a runtime check similar to what you've shown there. The compile time check is discussed earlier in this thread and has serious challenges, unfortunately. You need reliable ways to solve ambiguities and that implies some amount of type system integration, making the layering problematic given that this would be implemented during parsing.

ksluder · July 1, 2022, 6:34am

IIRC, Swift currently parses the both branches of #if blocks. Does that mean evaluation of the condition and stripping of the dead branch can happen later?

tshortli · July 14, 2022, 5:23pm

(Sorry for taking so long to reply, it took me a while to get back to this since I went on vacation).

Yes, the compiler parses both branches and the conditional is represented in the AST. However, I think there's still a cycle in the hypothetical implementation of this because type checking involves being able to walk the AST, and in order to walk the nodes in the AST representing these conditions you need to know whether the condition is active or not, which again requires type checking. I wouldn't be surprised if this were solvable but I do think it makes the endeavor tricky and may imply that there are some important limitations or edge cases to consider.

cmcgee1024 · July 31, 2024, 12:58pm

I think that something like this is going to be needed as Swift gets used in more places where C is used currently. There's a ton of C code that's guarded around ifdefs that are set based on autoconf/cmake checks for specific fields, or structs, even functions on libraries that it depends. Being able to put that check directly into Swift code could be compelling for C developers who have to rely on the preprocessor, in conjunction with a configuration tool to accomplish this.