[Pitch] Swift Predicates

Hi all,

@dgoldsmith and I have drafted a pitch outlining new APIs to express predicates natively in Swift code. We hope that this will provide a more robust and ergonomic experience for defining and using predicates in Swift. We plan to introduce this as part of the FoundationEssentials package and we'd love to hear your feedback.

Introduction

A predicate is a construct that performs a true/false test on a provided set of input values. It is very common for developers to need to construct predicates that can be sent across concurrency and process boundaries for later evaluation. Additionally, predicates are commonly converted to external formats such as SQL and other query languages for native evaluation in databases. Predicates are already used to pass a filter across software boundaries: through an API, to another process, or across the network. Apple platforms currently use NSPredicate for this purpose, but it has some deficiencies:

  1. It isn't type safe
  2. It doesn't work with autocomplete in an IDE
  3. It has its own syntax, different from Swift
  4. It isn't extensible to new expressions or types
  5. It is difficult to parse

We propose creating a new value type, Predicate , as part of the FoundationEssentials package, that addresses these problems. These new constructions of predicates will be expressed using standard Swift syntax elements and are fully type-checked by the compiler. This allows us to design Predicate to be type safe, readily archivable and Sendable , and integrated with Swift development environments.


I've posted the full pitch here as a gist, feel free to check it out the details and please let us know what you think!

50 Likes

Awesome pitch!

One question I have is about how you intend to support ?. in the context of dynamicMemberLookup on Variable, which, if I understand correctly, is the type of the callee in #predicate. I tried to write a similar but much less ambitious library a year or two ago, and the difficulty I encountered was that I couldn't use the ?. operator with dynamic member lookup. Does this Just Work now, or is there work that needs to be done specifically to support this?

Specific details on how Predicate will be securely Codable and the specifics for its archiving design will be addressed in a future proposal

Very much looking forward to seeing this.

2 Likes

I haven't read through the full proposal yet, but it's not clear to me from the Motivation section what the purpose is -- why would you use a predicate over a (T) -> Bool? The only time I've ever used NSPredicate is when an Objective-C API requires it, like XCUIElementQuery.

4 Likes

Amazing Pitch, extremely useful in both Server-Side Swift and App development. This would greatly improve the way the Fluent and Meow ORMs are written, and also any SQLite wrappers for iOS would greatly benefit from this. I'd imagine it leads to (some form of) a general API between ORMs in Swift, assuming we get this right.

I think that the extend of which this is useful vastly relies on the method of en/decoding these predicates. I really like the parameters being passed into here during evaluation as well.

My one issue is that in databases, you can yourself comparing the value inside this entity, to another value in the same entity. This wouldn't normally happen in simple models, but could happen in a model that was spawned as the result of a left-join between siblings.

struct Organisation: Codable {
  let id: UUID
  var name: String
  let creator: Reference<User>
  var admins: Set<Reference<User>>
  var members: Set<Reference<User>>
}

Now in this example, I've structured members to be exhaustive. So including the admins. I want all members that are not an admin. Set aside table optimisations.

The following model is pseudocode for a join relationship in my hypothetical ORM:

struct Join<Parent, Child>: Codable {
  let parent: Parent
  let child: Child
}

After querying a left-joi on all members, I'd now need to do a filter where !parent.admins.contains(child.id). If I understand it correctly, these types of operations - between 2 fields in the same model - would not be currently possible in this draft.

3 Likes

Overall, great pitch! It's great to see what macros enable here. I have unrelated comments:

  1. Why are the various build_XXX methods named with an underscore? This looks odd to me.
  2. You mentioned this being part of the FoundationEssential module which is going to be part of an open source Foundation package. This opens up a new set of constraint to APIs that are due to the fact that as a package this is not compiled with library evolution mode. One such thing is that you can never add new cases to enums without breaking API. In your proposal, the PredicateError is an enum. I would caution against using an enum here unless you are sure never introduce new cases. (In general, this is something larger to consider for the OSS Foundation efforts)
6 Likes

Thanks for catching this! The macro processor should break the KeyPath around the ? operator, along the lines of ((a.b)?).c.d. The ? operator in turn will be handled via a flatMap-style PredicateExpression. We have suffix ? in the implementation but having it in the middle of a KeyPath was a detail we'd missed.

The builder functions contain an underscore to segment the name of the function between the "build" prefix and the name of the pre-macro expansion function that was called. For example, a call to contains(where:) is translated to build_contains(_:where:) because we felt that buildcontains(_:where:) was confusing given it didn't follow the camel case style. Rather than trying to dynamically re-capitalize the function name during macro expansion and need to worry about collisions, we decided to use an underscore to make the builder function name a bit clearer and ensure there is a direct mapping between pre-expansion and post-expansion function names.

That's a great point, thank you for calling this out! You're right that in the OSS package this enum might be quite limited to avoid breaking API. Instead, we'll likely need to change this to a struct with static members like some other APIs have done.

1 Like

Great question - Predicate will provide quite a few benefits over a (T) -> Bool closure. With a predicate, you'll be able to encode/decode the expression in order to send it to another process via XPC (something that cannot be safely done with arbitrary closures) and you'll also be able to walk the tree of expressions to convert it to an external format like a SQL query (which is not quite possible with closures). If you only need to evaluate the result in-process and in-memory, then using a standard closure might fit your needs, but Predicate adds these extra capabilities that wrappers around external services and databases might need.

15 Likes

I might be missing something here but I think your example should be expressible like so:

let filter = #predicate<Join> { !$0.parent.admins.contains($0.child.id) }

Predicate should be capable of expressing most filters on single (or with variadic generics, multiple) objects, including relationships. Does what you're trying to express here not fit into that category?
If what you're talking about is relationships between two different instances of the same entity, that has to be expressed as a predicate where the relationship is part of the model, similar to SUBQUERY in NSPredicate (which corresponds to filter in Predicate). Does this address your concern?

That opens up a related discussion though (which is off-topic for here, but want to mention it) - it would be fantastic with library evolution mode available cross platform too and it'd be an interesting discussion point if it would make sense to enable it for some projects (e.g. the New Cross-platform foundation would be an obvious candidate). Just wanted to mention it as I think it is an important missing piece right now.

Related issues I've created that together might make that a viable alternative:

and

Also to echo others - very exciting pitch, reading/digesting... - one initial concern I'd have is verification of efficiency of implementation - in our experience from building similar infrastructure, performance is a key requirement for this functionality (we'd often use it to filter large amounts of data or with a significant inbound rate), so for it to be useful performance and good benchmark tests is critical. Would be great if it was easy for others to contribute benchmarks in some systematic way if you'd agree it is important?

Very excited for what this might mean for server-side Swift. I'd like to see what the folks maintaining Vapor and Fluent have to say about it.

I really like this proposal but I'm still not convinced about the use of micros. The alternatives considered section says operator overloads will lead to exponentially longer build types. Could you elaborate on where exactly these overloads would be defined?

1 Like

If we had gone with the operator overloading approach, a predicate's construction would look quite similar, something along the lines of:

let predicate = Predicate<Message> {
    $0.content.count < limit && $0.sender.firstName == "Jeremy"
}

However, in order to support just the operations present in this predicate we would have to add 9 new overloads combined to the <, &&, and == operators (each new allowed operator would have 3 overloads: one for a PredicateExpression on both sides, one for a PredicateExpression on the LHS with a constant on the RHS, and one for a constant on the LHS with a PredicateExpression on the RHS). These operators would be defined in Foundation and would return a built PredicateExpression rather than the result of the operator (like a Bool). In addition to the inability to produce meaningful compilation error messages and represent other operators such as conditionals and casting, it is unfortunately a known issue in Swift that adding new operator overloads can drastically increase the amount of time for which type checking an expression can take. We felt that adding a very large number of new operator overloads to an almost universally imported framework would lead to regressions in build times that were untenable. If you're curious about the build-time issues here, you can search around for other forum posts and GitHub issues about the compiler being unable to type check an expression in a reasonable time (such as this post for example), or @hborla can provide some more context on that.

4 Likes

Just did my first read of the pitch and it looks great. A Swift replacement for NSPredicate is a very welcome addition. And the fact that we can see a real use of new capabilities (macros) is very interesting. It may be worth mentioning that in other posts since this being in the "Development" category may make some folks on evolution miss it :)

Thank you for your thorough response! I asked how you envisioned operators, to see if you have standard types like String and Int conform to PredicateExpression. I agree that this approach would lead to awful compile times, but I was thinking we could have Value("Jeremy") or .value("Jeremy") instead of the bare string. I don't think we expect Equatable or other conformances in PredicateExpression types, so there shouldn't be any overloading.

This also creates problems in result builders, but by adding @available(*, unavailable, message: ""), API developers can somewhat customize diagnostics. I imagine we could use that to tell the user why they can't compare to expressions that don't have the same type, for example. Let me know if you had other cases in mind that this approach wouldn't satisfy.

I think overloading these as operators would be a nice additions for other DSL's too, but I'm not convinced these are so common to warran a macro system.

I know there are many complaints about Foundation's chunkiness in terms of binary size, so we should definitely preserve fast-to-compile APIs.


The reason I'm against macros is because of their complexity. On the other hand, operators provide the same expressive power (excluding conditionals and type-casting), and are a well recognized, easy-to-understand feature. Macros also tend to be harder to debug, while the operator directly produces a type you can inspect. So given all their issues, I don't think macros are worth it just to be able to write "Jeremy" instead of .value("Jeremy").

1 Like

I may have more thoughts later but I have one bit of feedback and a couple questions now.

First, the fact that this proposal uses macros while that feature isn't even under review but just a pitch makes it clear that the eventual review of that proposal will be a formality. I'd ask, once again, for the evolution process to be amended to allow Apple to clearly label pitches and proposals that will be accepted as such so the community can set expectations accordingly.

My question is, what are the performance goals of this feature? Obj-C's predicates were rather unpredictable in the performance arena given that they couldn't do many of the things db query optimizers could do, while the syntax practically begged users to treat them like a fully fledged db. So what testing has been done so far, and what kind of performance is expected? What optimizations, if any, can be performed?

Also, how are we going to debug these? Are we able to breakpoint through the predicate as it evaluates? Can we set column breakpoints to see the intermediate results of the query?

Oh, and for the pitch itself, I'd suggest naming all of your closure properties to make them more readable.

9 Likes

Oh, and will these APIs be offered from Foundation as extensions to Collection or anything? It would be nicer to use these as if they were instance functions rather than globals. That is

let result = collection.applying(predicate: #predicate(...)) // Not sure what the verb would be here.

would feel much nicer than

let predicate = #predicate(...)
let result = predicate.evaluate(collection)

If that sort of functionality isn't offered it seems like something that will be added pretty frequently, for predicate users at least. Best to nip that in the bud right away.

I'm also not sure this should be in Foundation, but... :man_shrugging:

Operators are resolved like global functions, so any new version of + or == or other standard operator is considered an overload, even if they operate on completely new types. That's the real pitfall of global operators -- adding new operators over types that are not used anywhere can still break existing code with type checker timeouts.

The problem with diagnostics in the operator overloading approach is specifically that the error messages will mention types that are specific to the predicate representation, which the programmer doesn't really care about. As the expression gets more complex, so does the type that attempts to encode the structure of the expression tree. You'd end up getting error messages that mention highly nested generic types, such as PredicateExpressions.Equal<PredicateExpressions.Value<String>, PredicateExpressions.Variable<Int>>, and the programmer has to mentally parse this giant type to figure out the types that they wrote that didn't match were String and Int.

Operators don't provide all of the same expressivity as macros. For example, a macro allows you to represent syntax in an expression tree that cannot be overloaded by operators, such as coercions and the ternary operator. Custom operators also rely on overload resolution and bidirectional type inference, which are incredibly difficult features to understand. With the compile-time performance and diagnostics implications, I personally don't think there's any usability win for the operator approach over macros.

To my mind, the beauty of the macro approach here is that the code in a #predicate closure will type check against the existing operator overloads that programmers are already familiar with, not against custom operators over predicate-specific types. So, the semantics of a predicate expression use concepts that the programmer is already familiar with. If and when the programmer wants to understand how predicates are represented, e.g. to write custom operations, only then do they need to dig deeper into the PredicateExpression representation.

A macro expansion may also have an opportunity to produce more actionable diagnostics that are domain-specific, which is something that has been a frequent pain point of using library-defined DSLs with standard Swift type checker diagnostics; the @available trick is not powerful enough to identify many common mistakes. If we go down the route of semantic macros, a macro will have more than enough information to produce errors about API misuse, possibly even with custom fix-its that are specific to the library. This is something that I would love to see explored more as part of the macro evolution discussions!

3 Likes

There is already one extension to Sequence in the pitch, pretty much what you suggest:

extension Sequence {
    public func filter(_ predicate: Predicate<Element>) throws -> [Element]
}

We can add others as needed.