[Discussion] `#if hasSymbol(Module.Symbol)`, a compile time check for a module symbol's availability

Type annotations and type coercion (someFunction as () -> Int) would work fine to disambiguate these.

I think the question of ambiguity becomes trickier when one considers how to distinguish Foo.bar as implemented as an extension in module Bar versus the exact same member with the exact same signature implemented as an extension in module Baz.

4 Likes

The spam filter just removed my post, I hope to see it brought back soon.

Yeah, this is the idiomatic type-based disambiguation strategy in Swift. It did occur to me whether there could potentially be some ordering issue here, implementation-wise (can conditional compilation conditions depend on the type-checking machinery?), but I can't say for sure. Same goes for the mangling-based strategy, in the other direction—does the compiler frontend have knowledge about the presence or absence of specific mangled symbols?

That's most unfortunate, seeing as I just replied to it, clearly providing a signal that it's not spam. Mods?

3 Likes

So, I also felt this concern when explaining disambiguation methods and decided that this strategy wouldn't work because of the aforementioned reasons. Because of that, I'm not entirely sure what the best strategy would be in the case of the compiler directive.

The only solution that came to mind resembles @_silgen_name because it uses the mangled name to look up the signature and symbol. Instead of disregarding the name from the signature like @_silgen_name (If the function interface doesn't match up with the type signature, there is no warning or error), it would use the type information and cross-reference that with known symbols matching the name and type signature provided. It's not the most ergonomic solution and requires knowing the mangled name, so I discredited its use, but I provided it as an example anyway.

My post has been reviewed and restored by a moderator if anyone still has residual thoughts about it. I'm beginning to work on a proposal along with an implementation. I also want to thank you all for the questions about it and your thoughts on the matter!

Yeah, this is what I would expect it to look like to use the runtime query. I think the tough part of designing the feature, though, will be working out how type checking might work. It probably ought to behave similarly to the way existing availability checking works where the compiler knows which declarations are potentially unavailable and requires you to prove that they are available before using them to avoid crashing. But then we will also need to consider whether it should be possible to limit which declarations are considered potentially unavailable and also whether additional syntax should be available to indicate that entire declarations require a symbol to be available (think about how you can use @available on an entire struct to limit its use to OS versions with a certain API that is fundamental to its implementation). There's a lot of design space to consider and I think it probably deserves a separate thread as the concepts are complementary but not entirely dependent on each other.

The type coercion approach does seem like the right syntax for consistency but the implementation feasibility question is an important one that I don't know the answer to yet.

I also considered mangled names as an option for identifying declarations when I was thinking about approaches for this but I believe that using them would lead to a poor developer experience since deriving the mangled name associated with a declaration would be a hassle for the writer and also isn't very readable. The topic also raises another question, which is whether this preprocessor query is semantically meant to match symbols in the library's binary or to match Swift declarations (some declarations yield more than one symbol). I think matching declarations is a more intuitive model overall.

If we needed to abandon the idea of matching declarations due to fundamental layering problems, one option to consider would be to allow the definition of a kind of preprocessor condition directly in source, similar to #define in C-like languages. I'm imagining something similar to what you get by passing a -DSOME_CONDITION flag to the compiler but declared in the source instead. Lookup would probably need to be scoped by module to avoid collisions. This is not my first choice for a lot of reasons (one being that discussion might immediately get bogged down by the question of whether Swift ought to have hygienic macros instead) but it feels a bit cleaner to me than using mangled names.

4 Likes

Beyond fundamental layering problems, there's another concern I have about the compile-time version of this feature, related to the issues that have been discussed around disambiguation, and also somewhat touched on by this comment:

I think what's been bugging me is that in the failure condition, i.e., the case where the symbol in question is not present, it's really difficult to talk about symbol identity.

Suppose someone is writing a program and compiling against platforms X and Y, which have (presently) divergent APIs. Platform X supports doSomething:

struct Foo {
  func doSomething(with bar: Bar)
}

whereas platform X does not:

struct Foo {
  // nothing
}

so, the author uses the #if hasSymbol feature:

#if hasSymbol(Foo.doSomething(with:)
Foo().doSomething(with: .init())
#endif

next year, after hard work on platform Y, the developers release a new version of the library:

struct Foo {
  // Note: do *not* use 'doSomething' with a default-initialized 'UnsafeBar'
  // This is a serious programming error
  func doSomething(with unsafeBar: UnsafeBar)
}

The author of our original program has now inadvertently introduced a use of an unsafe API, and an incorrect use at that!

Ostensibly, when they wrote #if hasSymbol(Foo.doSomething(with:), they were in some sense only wanting to take the #if branch if platform Y introduced the 'same' API as platform X, and not a different one. Perhaps this has a clear meaning in the common cases (like Apple's platforms), and this issue wouldn't really arise in practice.

But in the edge cases, I don't know how to talk about whether two different symbols, on platforms which may have almost entirely divergent API surfaces, are 'really' the same underlying symbol. Should we require type-based disambiguation for all symbols passed as an argument to hasSymbol, even those that are not today ambiguous? Is type-name-based identity even the right tool for specifying a specific symbol which doesn't currently exist at all on the platform being compiled against?

I know you have mentioned that you want to keep the runtime version of this feature to a separate thread, so I won't go into my thoughts on that part of the feature too extensively, except to say that the problem seems much more tractable when we are able to resolve specific symbol identity at the time that the check is first written.

4 Likes

Hm, yeah I suppose the unique difficulty is that the ambiguity in the situation you have described is undetectable by the compiler in the first place. If two doSomething(with:) APIs are present in the source files but only ever active under mutually exclusive compilation conditions then the compiler simply has no way to help either the library author or the client. We've so far focused on matching API functions in the examples but I think there may be additional challenges with other types of declarations, too. For example, if you wanted to match on a class Foo existing (perhaps to use Foo.self) there'd potentially be a similar ambiguity with enum Foo or struct Foo being defined under different conditions right? Use Foo.self as an argument to a function that takes Any and you've again got a potentially dangerous ambiguity.

Perhaps we need a predicate syntax that is something more like this (pretend that we have the fully qualified module name syntax that uses ::):

#if hasClass(FooKit::Foo)
#if hasClassMethod(FooKit::Foo.bar(_:) as (Swift::Int) -> Swift::String)
#if hasEnumCase(FooKit::Result.success)

Edit: I updated the hasClassMethod example above to qualify Int and String as types from the Swift module and then realized that even that might not be enough - do we need to qualify the kind of declaration those types have in the coercion as well?

2 Likes

I'm not liking the casting in the compiler directive especially implementation-wise. By introducing it there we are stuck in the chicken and egg situation. But I like how you added fully-qualified name syntax into here and the specific directives hasClass, hasClassMethod etc. I believe though that by wrapping the symbol, it would help disambiguate outside of the directive itself for symbol types. (You'll see more examples of this below)

#if hasSymbol(classMethod(FooKit::Foo.bar(_:)))

The one thing I'm beginning to realize is that, yes a compiler directive #if hasSymbol would be powerful, but I don't think it would make sense to have this as the main directive seeing all the issues that could come up.

It only would work in restricted circumstances and would require possibly other features that are not available during that point in the compilation. For example, it would be a lot easier if there was a way to define preprocessor conditions in source. That way, this could be cleared up:

#define barType (struct(Swift::Int)) -> struct(Swift::String)
#if hasSymbol(classMethod(FooKit::Foo.bar(_:) as barType))

But alas, we don't, and that would be another pitch in itself. Because of that, we're left with this monster:

#if hasSymbol(classMethod(FooKit::Foo.bar(_:) as (struct(Swift::Int)) -> struct(Swift::String)))

I mean, yes, it provides all the information necessary to properly identify a symbol without the need for magic disambiguation on behalf of an almost non-existent type inference model, but it's not ergonomic and introduces a lot of design changes in how if-configs are parsed.

With all that said, I don't see another way for this to be implemented. These changes frankly aren't the worst. People using this would already have the knowledge of what they need to be done and won't have a problem supplying the type information for it. There definitely is a need for this feature, but it would be better implemented in runtime, forgoing the optimizations that could be applied by having it as a compiler directive.

EDIT: Shouldn't Int and String follow the fully-qualified name syntax and be extended in reference from their modules through ::? I edited my examples to show this.

Yeah, I forgot to make those consistent, I'll update the post.

And now that post is gone, flagged by the spam bot :frowning:

1 Like

Once again, that is seriously unfortunate since there were replies. I'm not sure how I feel about it taking 12 hours to restore either.

EDIT: I see it took less time than it did for me which is good!

Let me know if I'm misinterpreting what you're saying here, but I want to clarify that the problem I want to solve cannot be solved fully solved with runtime checks unless we pursue a very different approach to the problem. In the case that motivated me to think about this design, a build time query in the preprocessor specifically allows a source file to reference some recently introduced declarations but be structured in such a way that it still compiles against an SDK that does not yet have those declarations. Code referencing this potentially missing declaration will of course not typecheck without the module's declaration, and that's why we've been looking for a preprocessor based solution.

However, the discussion we've been having is leading me to revisit another approach that I had initially discarded but now looks more appealing given how problematic the preprocessor approach may be. We could make it possible to "forward declare" declarations, like in C-like languages:

// Locally declare the FooKit API that may be missing locally
@forwardDeclaration
extension FooKit::Foo {
  func bar(_ x: Int) -> String
}

// Use the API, which is known to be potentially unavailable due to `@forwardDeclaration`
if #hasSymbol(FooKit::Foo.bar(_:)) { ... }

(This is not meant to be the real syntax; just a sketch for demonstration purposes).

The advantage of this approach is that it leverages existing Swift parsing and typechecking to give the type checker the information it needs to type check code using bar(_:) whether or not it is declared in the original module. Additionally, it ought to be possible for the compiler to diagnose ambiguities when they do arise, solving one of the problems we were having without quite as much unnatural verbosity. I had originally dismissed this idea because it seems like a bigger addition to the language, but after talking through the difficulties of the preprocessor approach I'm not sure it is.

It might make sense to have some additional syntax that scopes a forward declaration to a specific module for clarity:

forwardDeclare FooKit {
  extension Foo { ... }
} 

You could even imagine taking this farther and allowing polyfills of missing APIs so that it is always possible to call the function but your implementation is substituted whenever the API is not defined in the SDK or unavailable at runtime. However, that idea is really only practical for functions so it's probably not a general enough solution.

1 Like

Reminiscent of a feature we already have for internal Swift use only (albeit discouraged):

3 Likes

Just wanted to give you all an update about where I've ended up landing in my investigation. We already have an underscored version of canImport() which solves the SDK evolution problem:

#if canImport(FooKit, _version: 42.1)
  // Use APIs that were introduced at or before version 42.1 of FooKit
#endif

Module owners can specify the version of their module by supplying the -user-module-version flag to the frontend. We may want to make this feature official by bringing it through evolution; I'm curious if folks think it is widely applicable enough to do so.

So while my problem is solved by this, I realize it does not solve the problem that motivated this thread. Decls that are platform/SDK specific might be omitted from the distributed module in one SDK while being present in another but the module version number could easily be the same in both SDKs.

The canImport(Module, _version: ...) syntax does give me another idea that's an iteration on one I suggested earlier in the thread, though. Suppose module owners were able to specify a "variant" (better name to be bikeshedded) for a build of a module with some kind of string identifier. You could imagine allowing clients to use a syntax like this then:

#if canImport(FooKit, variant: Linux)
  // Use APIs that are only present in the "Linux" builds of the module
#endif

In this simplest version of this proposal, the variant is an arbitrary string and there is just one variant for a given built module and the possible values are mutually exclusive. The module owner would need to document when the presence of an API depends on variant. Conceptually these variants are similar to #if os(...) but it's up to specific library to define what a variant means for them.

I can see potential problems with this design as libraries evolve over time and need to introduce new variants, but it feels like there might be something there. The big advantage I see is that it's both conceptually simple and straightforward to implement and seems like a useful building block.

A slightly more complex version of this would be to allow multiple of these strings to be supplied by the module owner (maybe we would call them "capabilities" in this model). Then you could have a capability per API or set of related APIs that is either present or not present. This is essentially the same idea as exporting -D compile time conditions from a module but the syntax makes it clear that these capabilities are scoped to a module. This is more flexible and is probably more friendly to library evolution than the variant idea.

With both ideas, you could even print the decls that are protected by a certain variant/capability in the .swiftinterface with a guard surrounding it so that all the APIs and which capabilities they require are fully documented by the interface.

1 Like

As far as I am aware, Swift also lacks a way to detect if a weakly-imported C symbol is null at runtime. I would like to think that this deserves a true symbol-specific check, the syntax for which would also work for Swift symbols. For example:

if @hasSymbol(WeakCSymbol) {
  // runtime check for weakly imported symbol
}

#if hasSymbol(SomePlatformSpecificSymbol) {
  // compile time check for conditionally available symbol
}

Yes, I'm separately looking into the design of a runtime check similar to what you've shown there. The compile time check is discussed earlier in this thread and has serious challenges, unfortunately. You need reliable ways to solve ambiguities and that implies some amount of type system integration, making the layering problematic given that this would be implemented during parsing.

1 Like

IIRC, Swift currently parses the both branches of #if blocks. Does that mean evaluation of the condition and stripping of the dead branch can happen later?

(Sorry for taking so long to reply, it took me a while to get back to this since I went on vacation).

Yes, the compiler parses both branches and the conditional is represented in the AST. However, I think there's still a cycle in the hypothetical implementation of this because type checking involves being able to walk the AST, and in order to walk the nodes in the AST representing these conditions you need to know whether the condition is active or not, which again requires type checking. I wouldn't be surprised if this were solvable but I do think it makes the endeavor tricky and may imply that there are some important limitations or edge cases to consider.