Syntactic protocols (cf. function builders)

@davedelong's review of SE-0289 Function Builders got me thinking about all the other "syntactic transforms that turn into normal function calls / member accesses":

Even without designing a full macro system, even if they're still hardcoded in the compiler, it feels like it would be nice in several ways to have a way to describe these entry points in the language itself:

  • Reusable logic for code completion, when implementing such a type
  • Reusable logic in the compiler for making the transformations
  • A place to put doc comments

If, for now, we set aside ambitions of anything but doc comments, what might this look like?

syntactic protocol MainEntryPoint {
  static func main()
}

syntactic protocol Callable {
  // '*' for "any arguments" to avoid confusion with existing variadics
  // when there are other required arguments.
  func callAsFunction(*)
}

syntactic protocol CaseMatching {
  // A little funny since it does a normal operator lookup here,
  // not just into the type.
  static func ~= <Input>(_ expected: Self, _ actual: Input) -> Bool
}

syntactic protocol DynamicMemberLookup {
  subscript<Name: ExpressibleByStringLiteral, Value>(
    dynamicMember member: Name
  ) -> Value { get set }

  subscript<Key: AnyKeyPath, Value>(
    dynamicMember keyPath: Key
  ) -> Value { get set }
}

syntactic protocol DynamicCallable {
  // 'optional' in that the compiler will use something else
  // if the client doesn't implement this, instead of just an error
  optional func dynamicallyCall<Arguments: ExpressibleByArrayLiteral, Result>(
    withArguments: Arguments
  ) -> Result

  func dynamicallyCall<Arguments: ExpressibleByDictionaryLiteral, Result> (
    withKeywordArguments: Arguments
  ) -> Result where Arguments.Key: ExpressibleByStringLiteral
}

syntactic protocol StringInterpolation where Self: StringInterpolationProtocol {
  func appendInterpolation(*)
}

And the big one:

syntactic protocol FunctionBuilder {
  optional func buildExpression<Input, Output>(_ expression: Input) -> Output
  func buildBlock<Output>(*) -> Output
  optional func buildFinalResult<Input, Result>(_ component: Input) -> Result
  func buildOptional<Input, Output>(_ input: Input?) -> Output
  optional func buildEither<Input, Output>(first: Input) -> Output
  optional func buildEither<Input, Output>(second: Input) -> Output
  func buildArray<Input, Output>(_ components: [Input]) -> Output
  optional func buildLimitedAvailability<Input, Output>(_ component: Input) -> Output
}

There's a lot to explore here, so, thoughts?

22 Likes

Just to be clear on the application of this, were it implemented, I would conform my main entry point to MainEntryPoint rather than decorating it with @main to get the same behavior?

I wasn't really thinking that, no. At least for now, these would just be what the compiler would use to generate those calls. Users wouldn't know about them at all except that they'd be documented along with the features themselves. (Remember, some of these features don't require any annotations at all beyond the existence of the function.)

So the idea here is to have a place to hang the documentation, to deobfuscate the “magic” of these conventions?

That's the most immediately visible effect, but it also means there's not custom code completion logic for each of these features (or any more that get added), and maybe diagnostics would be better too (since there would be a standard set of diagnostics for all of them, leaving compiler contributors time to focus on just what's different about each).

I think this could also be the groundwork for generalized transformations, longterm, that don't just generate arbitrary unchecked expressions. But even without that it could still be useful.

7 Likes

The other giant "syntactic transform" that's come up before (but is not yet implemented) is regular expressions. Could this be used to define a compiler-optimized regex type?

I've commented about this madness a year ago. We need to pin down the meta-protocol concept that's still trying to kick its way out.


Instead of a "*," which is only an identifier-class punctuator, maybe we could use something more precise.

syntactic protocol Callable {
    func callAsFunction<static variadic T: labeled Any>(T)
}

Where "<T>" is the standard generic specifier. The "variadic" part means that the parameter is a variadic generic parameter, a syntax used before. The "static" part means that instead of implementing a generic method, it represents a family of plain ol' methods (or restricted generic methods) and full coverage isn't required. The "labeled" part means the types in the parameter list can have the same attachments as function parameters, like "@escaping," "...," or "inout."

(Hmm, the "static" part may need to be implemented as an attribute instead, IDK.)

You don't need to be able to describe the entry points in Swift itself to achieve these goals, because the reusable logic and place for doc comments can just be inside/alongside the compiler itself (like the implementation is). I'm not sure what the benefit would be of inventing a syntax for this in Swift without any way of providing an implementation. It locks down the syntax and reserves some keywords without a clear idea of how a theoretical future macro system would work.

4 Likes

An important scoping point is that this idea doesn't, on its own, do anything to provide transformations; it only defines the shape of what the transformation "turns into". I would hope that other features could then come about to use these as targets (macros, regex literals, etc), and I'd want that to be a design constraint, but this part of the design is meant to be agnostic to how the calls get generated.

EDIT: But I have tried to design a regex syntax in my spare time that handles extraction (i.e. "match this pattern and then assign the matched groups to variables with these names") and it's really hard! The interspersing of names with pattern syntax makes it a lot harder to see the pattern, and if we want to support general interpolation (i.e. "here's a value or a sub-regex to match") then that needs to be distinguished from the names being bound ("are you trying to substitute x in or make a new binding called x"? i.e. why you need let in case matches).

I'm sorry I missed that at the time. :-) Thanks for bringing it up then. I agree that for better or worse the language is embracing these things and that it's better to formalize them than to have them ad hoc and slightly different each time. I forgot about property wrappers!

syntactic protocol PropertyWrapper {
  init(*)
  init<Value>(wrappedValue: Value, *)
  // hm, not only did I introduce a generic property syntax,
  // but this Value is supposed to be the same as the initializer's
  // Maybe reused generic params should be assoc types?
  var wrappedValue<Value>: Value { get set }
}

Two things here:

  • I wouldn't want to step on a real variadic generics syntax, so I think we should go with something extremely minimal that could eventually be deprecated.

  • I've deliberately understated these requirements by a lot. There's nothing about permitting throws or mutating, or that separate requirements are nearly always independent (i.e. if you don't use a feature you can just not implement it). Since all of these are targets of a syntactic transformation, they could still be semantically invalid at the end of that, the same way a source file that parses may still not be valid Swift code.

    In a fully general system we may want to have a way to enforce some additional constraints, but for once I'd rather start lax and simple.

This is a very good point: the pseudo-syntax here could just as easily be a table included in the compiler, not parsed in any way. Maybe if there's enough agreement that this abstraction is a good idea I'll take it over to the Development/Compiler category. I do still think it's interesting to discuss what we would want out of such a format in the language itself, but without a definite application like a macro system it shouldn't be something I/we try to turn into a proposal.

3 Likes

What about

syntactic protocol PropertyWrapper {
    associatedtype Posed
    template<1...> init(wrappedValue: Posed, #template)
    var wrappedValue: Posed { get set }
}

The "template" keyword can go in front of an initializer, subscript, or method to indicate that we are making the member generic only on the base name. The bracketed value or range indicates how many versions of the member there can be. The "1..." is the default. In the parameter list, the "#template" is where the user-defined parameters can go.

(A generic member can't be used to satisfy a template unless the maximum count is unlimited. Or the compiler can prove there will be less than Max instantiations, which may be difficult.)

(Wait, do property wrappers have to have at least one non-Wrapping initializer?)

Trying to reuse more code within the compiler might also be a nice way to start to understand the common functionality required and think about how some subset could be exposed as a future Swift feature.

3 Likes

This reminds me of what nim (nim lang) is capable of with its ability to give users the way to inspect and modify the syntax tree. I believe that the most general solution could be that protocols would allow inspecting of the swift ast. The application of this feature is vast indeed. I can imagine how function can be tagged with an AlwaysTerminateable protocol to prove certain aspects of program' behaviour.
A little side note - maybe the feature you propose should rather have a form of an attribute (@syntaxInspector or smth). Whatever the approach would be, it is obvious that the language' core should have dedicated helpers for users to work with ast.

Another Trial Design

For now, no "syntactic" protocols, just extensions to protocol. They will be "optional" and "template". The first one already exists for Objective-C protocols. For the 1.0 version, use of template for user-defined protocols is banned outside the standard library, but users can conform to such protocols. The same restriction applies to optional, unless the protocol is marked as @objc, refines NSObjectProtocol, or has a class requirement of something that conforms to NSObjectProtocol (like NSObject, NSProxy, or a derived class).

The "template" declaration modifier can only appear for members of a protocol declaration, and only for methods, initializers, and subscripts. It can use the "#any" placeholder for function parameters, opaque types for generic parameters, or both.

protocol MyProtocol1 {
    template init<some T>(wrapped: T, #any)
}

Types implementing the above protocol must have at least one initializer with at least one parameter, where the first parameter has the label "wrapped," and can have any number of parameters following it. The generic parameter isn't a real one, in that the implementing member doesn't need to be generic:

struct MyType1: MyProtocol1 {
    init(wrapped: Int) {}
}

There can be more than one match:

struct MyType2: MyProtocol1 {
    init(wrapped: Int, somethingElse: Double) {}
    init(wrapped: String) {}
}

A placeholder generic can be restricted:

protocol MyProtocol2 {
    template myFunc<some T: SignedNumeric>(type: T.Type) -> T
}

struct MyType3: MyProtocol2 {
    myFunc(type: Int.Type) -> Int {/*...*/}
    myFunc(type: String.Type) -> String {/*...*/}
    myFunc() -> Bool {/*...*/}
}

String doesn't conform to the template method, but templates don't care. And they definitely don't care about the similarity named method with a Bool return.

Oh, and just because templates avoid the need for actual generics doesn't mean we can't implement a template as one:

struct MyType4: MyProtocol2 {
    myFunc<U: BinaryInteger>(type: U.Type) -> U {/*...*/}
}

The versions of myFunc for a type that conforms to SignedNumeric will match the template, but any other BinaryInteger-conforming type will not (and won't cause a error).

You can also define a method with both opaque and conventional generic parameter types. The implementing methods need to be generic at least on the conventional generic parameters.

protocol MyProtocol3 {
    template func myFunc2<some T: BinaryInteger, U: RandomAccessCollection>(_ x: U) -> T
}

struct MyType5 {
    func myFunc2<V: Collection>(_ y: V) -> Uint { return numericCast(y.count) }
}

These protocols are restricted to the standard library for now because there no witness tables for them; only the compiler and runtime code can look for matches and/or exploit them (like for function builders and such). A template member can also be optional.

If syntactic / template protocols are being formalized,
it would seem to me that it should support the ObjC XCTest:

// I think this it?
protocol TestCase {
  template func `test ## #any`() throws
}

This is absolutely the most general solution, but I'm not sure it's what we'll want to do. There's a huge range of space between "manipulates tokens" (like C) and "literally just a funny way to spell a function call" (not actually a useful macro system), with "inspects the syntax tree" and "inspects the type-checked AST" being particularly powerful points in between. Whether or not we want to make that power a feature in Swift depends on the use cases (or on deciding that we want to make the language that compile-time flexible).

Still, this design doesn't actually say anything about where calls to these members come from; it just says "some feature will generate calls to declarations with this form". That could be macros, or built-in compiler features, or…well, those might be the only categories, I guess.

4 Likes

I think one important note here is that in none of the existing examples do you have to implement every requirement; you only have to implement the ones that actually get used. Property wrappers don't technically have to implement any initializers because you can assign them directly in init (from a factory, say). But the compiler can generate calls that use wrappedValue (if you provide an initial value), as well as calls that do not use wrappedValue (if you don't provide any value at all).

Another important point is that these are purely compile-time concepts: you can't use these "protocols" as generic bounds or protocol-typed values (existentials), because you can't generate calls to these functions at run time. I suppose you could make the "syntactic" marker live on the members rather than the protocol itself (closer to your second example, @CTMacUser), but then people would have to deal with different requirements behaving differently, and a protocol with all syntactic requirements might still have some run-time overhead (due to the conformance being recorded in the binary, just in case anyone wants to check it). So I think it's better if it's something distinct from regular protocols (which might mean not even using protocol as an introducer, long-term).

Mm, we could. I'd rather go in the direction of attributes on test methods rather than naming conventions myself. There's also nothing right now that generates calls to these methods at compile-time, but I suppose test discovery mechanisms on Linux do look up these methods at run time, and that's a related thing.

2 Likes

I have encountered a thread once where people wanted to provide default implementation or rather a conformace synthesis for some type, and this feature would be a perfect candidate to give users the clear way to do it, instead of baking few select protocols into compiler implementation.

I cant follow you on this one, Jordan; could you clarify?

I suppose it could be useful for speccing the behaviour of external code generation tools too.

I don’t know where the magic lives for synthesized implementations for things like Codable, CaseIterable, etc. but if we’re doing syntactic protocol definitions, should those things go there, or at least be identified there for clarity’s sake?