[Pitch] Swift Predicates

jmschonfeld · December 12, 2022, 9:43pm

Hi all,

@dgoldsmith and I have drafted a pitch outlining new APIs to express predicates natively in Swift code. We hope that this will provide a more robust and ergonomic experience for defining and using predicates in Swift. We plan to introduce this as part of the FoundationEssentials package and we'd love to hear your feedback.

Introduction

A predicate is a construct that performs a true/false test on a provided set of input values. It is very common for developers to need to construct predicates that can be sent across concurrency and process boundaries for later evaluation. Additionally, predicates are commonly converted to external formats such as SQL and other query languages for native evaluation in databases. Predicates are already used to pass a filter across software boundaries: through an API, to another process, or across the network. Apple platforms currently use NSPredicate for this purpose, but it has some deficiencies:

It isn't type safe
It doesn't work with autocomplete in an IDE
It has its own syntax, different from Swift
It isn't extensible to new expressions or types
It is difficult to parse

We propose creating a new value type, Predicate , as part of the FoundationEssentials package, that addresses these problems. These new constructions of predicates will be expressed using standard Swift syntax elements and are fully type-checked by the compiler. This allows us to design Predicate to be type safe, readily archivable and Sendable , and integrated with Swift development environments.

I've posted the full pitch here as a gist, feel free to check it out the details and please let us know what you think!

benpious · December 12, 2022, 11:26pm

Awesome pitch!

One question I have is about how you intend to support ?. in the context of dynamicMemberLookup on Variable, which, if I understand correctly, is the type of the callee in #predicate. I tried to write a similar but much less ambitious library a year or two ago, and the difficulty I encountered was that I couldn't use the ?. operator with dynamic member lookup. Does this Just Work now, or is there work that needs to be done specifically to support this?

Specific details on how Predicate will be securely Codable and the specifics for its archiving design will be addressed in a future proposal

Very much looking forward to seeing this.

bbrk24 · December 13, 2022, 3:50am

I haven't read through the full proposal yet, but it's not clear to me from the Motivation section what the purpose is -- why would you use a predicate over a (T) -> Bool? The only time I've ever used NSPredicate is when an Objective-C API requires it, like XCUIElementQuery.

Joannis_Orlandos · December 13, 2022, 8:56am

Amazing Pitch, extremely useful in both Server-Side Swift and App development. This would greatly improve the way the Fluent and Meow ORMs are written, and also any SQLite wrappers for iOS would greatly benefit from this. I'd imagine it leads to (some form of) a general API between ORMs in Swift, assuming we get this right.

I think that the extend of which this is useful vastly relies on the method of en/decoding these predicates. I really like the parameters being passed into here during evaluation as well.

My one issue is that in databases, you can yourself comparing the value inside this entity, to another value in the same entity. This wouldn't normally happen in simple models, but could happen in a model that was spawned as the result of a left-join between siblings.

struct Organisation: Codable {
  let id: UUID
  var name: String
  let creator: Reference<User>
  var admins: Set<Reference<User>>
  var members: Set<Reference<User>>
}

Now in this example, I've structured members to be exhaustive. So including the admins. I want all members that are not an admin. Set aside table optimisations.

The following model is pseudocode for a join relationship in my hypothetical ORM:

struct Join<Parent, Child>: Codable {
  let parent: Parent
  let child: Child
}

After querying a left-joi on all members, I'd now need to do a filter where !parent.admins.contains(child.id). If I understand it correctly, these types of operations - between 2 fields in the same model - would not be currently possible in this draft.

FranzBusch · December 13, 2022, 9:34am

Overall, great pitch! It's great to see what macros enable here. I have unrelated comments:

Why are the various build_XXX methods named with an underscore? This looks odd to me.
You mentioned this being part of the FoundationEssential module which is going to be part of an open source Foundation package. This opens up a new set of constraint to APIs that are due to the fact that as a package this is not compiled with library evolution mode. One such thing is that you can never add new cases to enums without breaking API. In your proposal, the PredicateError is an enum. I would caution against using an enum here unless you are sure never introduce new cases. (In general, this is something larger to consider for the OSS Foundation efforts)

dgoldsmith · December 13, 2022, 5:28pm

Thanks for catching this! The macro processor should break the KeyPath around the ? operator, along the lines of ((a.b)?).c.d. The ? operator in turn will be handled via a flatMap-style PredicateExpression. We have suffix ? in the implementation but having it in the middle of a KeyPath was a detail we'd missed.

jmschonfeld · December 13, 2022, 5:36pm

The builder functions contain an underscore to segment the name of the function between the "build" prefix and the name of the pre-macro expansion function that was called. For example, a call to contains(where:) is translated to build_contains(_:where:) because we felt that buildcontains(_:where:) was confusing given it didn't follow the camel case style. Rather than trying to dynamically re-capitalize the function name during macro expansion and need to worry about collisions, we decided to use an underscore to make the builder function name a bit clearer and ensure there is a direct mapping between pre-expansion and post-expansion function names.

That's a great point, thank you for calling this out! You're right that in the OSS package this enum might be quite limited to avoid breaking API. Instead, we'll likely need to change this to a struct with static members like some other APIs have done.

jmschonfeld · December 13, 2022, 5:38pm

Great question - Predicate will provide quite a few benefits over a (T) -> Bool closure. With a predicate, you'll be able to encode/decode the expression in order to send it to another process via XPC (something that cannot be safely done with arbitrary closures) and you'll also be able to walk the tree of expressions to convert it to an external format like a SQL query (which is not quite possible with closures). If you only need to evaluate the result in-process and in-memory, then using a standard closure might fit your needs, but Predicate adds these extra capabilities that wrappers around external services and databases might need.

dgoldsmith · December 13, 2022, 6:11pm

I might be missing something here but I think your example should be expressible like so:

let filter = #predicate<Join> { !$0.parent.admins.contains($0.child.id) }

Predicate should be capable of expressing most filters on single (or with variadic generics, multiple) objects, including relationships. Does what you're trying to express here not fit into that category?
If what you're talking about is relationships between two different instances of the same entity, that has to be expressed as a predicate where the relationship is part of the model, similar to SUBQUERY in NSPredicate (which corresponds to filter in Predicate). Does this address your concern?

hassila · December 14, 2022, 9:01am

That opens up a related discussion though (which is off-topic for here, but want to mention it) - it would be fantastic with library evolution mode available cross platform too and it'd be an interesting discussion point if it would make sense to enable it for some projects (e.g. the New Cross-platform foundation would be an obvious candidate). Just wanted to mention it as I think it is an important missing piece right now.

Related issues I've created that together might make that a viable alternative:

github.com/apple/swift-package-manager

Dynamic library support on Linux with library evolution

opened 10:45AM - 09 Aug 22 UTC

hassila

enhancement

### Description # Background We have an enterprise-style application where we …have a large number of libraries (swift packages) that we'd like to be able to evolve independently using library evolution and to dynamically link them to a large number of consumers. Basically adding binary dependency support to a dynamic library generated using SPM similar to XCFrameworks. I've read https://github.com/apple/swift-evolution/blob/main/proposals/0305-swiftpm-binary-target-improvements.md and would like to suggest a first step could be to cater to this kind of 'enterprise' deployment scenario rather than taking on the full 'many linuxes support everywhere' - we have full control over deployment operating system and swift toolchain (we can mandate for our customers what they must have to run the software and would do builds for a small number of deployment environments that we support). The artefact bundle format seems that it would suffice for such a scenario. So trying to break down what's needed (perhaps missing something): - Support for binary dependencies also for Linux (preferably using `artifactbundle` format so we can have both macOS and Linux in a single dependency if putting things in the right places) - Support for enabling library evolution for SPM packages generating dynamic libraries on both macOS and Linux (accepting the requirement for same toolchain on Linux) I think this single-linux-distribution as a first step could be very useful not only for us, but also for others - and help as a stopgap deployment measure. See also related discussions / issues: https://github.com/apple/swift/issues/60458 https://forums.swift.org/t/use-a-dynamic-library-in-a-swift-package-on-linux/59510 https://forums.swift.org/t/availability-when-using-library-evolution-resilience-for-third-party-libraries/59341/3 etc. Also recently added: https://github.com/apple/swift/issues/66156 ### Expected behavior _No response_ ### Actual behavior _No response_ ### Steps to reproduce _No response_ ### Swift Package Manager version/commit hash _No response_ ### Swift & OS version (output of `swift --version && uname -a`) _No response_

and

github.com/apple/swift

Availability annotations for third party libraries when using Library evolution/ resilience

opened 10:27AM - 09 Aug 22 UTC

hassila

new feature

**Is your feature request related to a problem? Please describe.** We’ve ende…d up realizing we’d like to use the library evolution features for resilient api surfaces for our product (yes, needed for an enterprise solution where clients of libraries must be able to be replaced independently and don’t have source access. Both macOS and Linux. Aware of tool chain impedance matching on Linux required - that’s ok). The question is how we can evolve our api with availability annotations? Current annotations for availability is tied to platform or swift version only, but we'd want to tie it to our products version numbers instead. Not sure the best way to express it, but if a typical swift macro would be: `@available(swift 5.1)` we'd want something like: `@available(ourProductName 2.0)` which we could use with e.g. ``` if #available (ourProductName 2.0) { return x } else { return y } ``` or e.g. `@available(ourProductName , deprecated: 3, renamed: "newNameForThis")` **Describe the solution you'd like** Add support for third-party availability annotations to allow tying availability to our product releases instead of platform/swift versions. **Describe alternatives you've considered** As a stop-gap we considered to tie it to Swift releases instead, which is not really a good workaround, but AFAWU it's fundamentally the only option we have (for it to work on both Linux and macOS) - and not very good conceptually. **Additional context** Some background discussion here: https://forums.swift.org/t/availability-when-using-library-evolution-resilience-for-third-party-libraries/59341

hassila · December 14, 2022, 9:10am

Also to echo others - very exciting pitch, reading/digesting... - one initial concern I'd have is verification of efficiency of implementation - in our experience from building similar infrastructure, performance is a key requirement for this functionality (we'd often use it to filter large amounts of data or with a significant inbound rate), so for it to be useful performance and good benchmark tests is critical. Would be great if it was easy for others to contribute benchmarks in some systematic way if you'd agree it is important?

davdroman · December 14, 2022, 10:45am

Very excited for what this might mean for server-side Swift. I'd like to see what the folks maintaining Vapor and Fluent have to say about it.

filip-sakel · December 14, 2022, 11:43am

I really like this proposal but I'm still not convinced about the use of micros. The alternatives considered section says operator overloads will lead to exponentially longer build types. Could you elaborate on where exactly these overloads would be defined?

jmschonfeld · December 14, 2022, 5:55pm

If we had gone with the operator overloading approach, a predicate's construction would look quite similar, something along the lines of:

let predicate = Predicate<Message> {
    $0.content.count < limit && $0.sender.firstName == "Jeremy"
}

However, in order to support just the operations present in this predicate we would have to add 9 new overloads combined to the <, &&, and == operators (each new allowed operator would have 3 overloads: one for a PredicateExpression on both sides, one for a PredicateExpression on the LHS with a constant on the RHS, and one for a constant on the LHS with a PredicateExpression on the RHS). These operators would be defined in Foundation and would return a built PredicateExpression rather than the result of the operator (like a Bool). In addition to the inability to produce meaningful compilation error messages and represent other operators such as conditionals and casting, it is unfortunately a known issue in Swift that adding new operator overloads can drastically increase the amount of time for which type checking an expression can take. We felt that adding a very large number of new operator overloads to an almost universally imported framework would lead to regressions in build times that were untenable. If you're curious about the build-time issues here, you can search around for other forum posts and GitHub issues about the compiler being unable to type check an expression in a reasonable time (such as this post for example), or @hborla can provide some more context on that.

Alejandro_Martinez · December 14, 2022, 6:36pm

Just did my first read of the pitch and it looks great. A Swift replacement for NSPredicate is a very welcome addition. And the fact that we can see a real use of new capabilities (macros) is very interesting. It may be worth mentioning that in other posts since this being in the "Development" category may make some folks on evolution miss it :)

filip-sakel · December 14, 2022, 9:04pm

Thank you for your thorough response! I asked how you envisioned operators, to see if you have standard types like String and Int conform to PredicateExpression. I agree that this approach would lead to awful compile times, but I was thinking we could have Value("Jeremy") or .value("Jeremy") instead of the bare string. I don't think we expect Equatable or other conformances in PredicateExpression types, so there shouldn't be any overloading.

This also creates problems in result builders, but by adding @available(*, unavailable, message: ""), API developers can somewhat customize diagnostics. I imagine we could use that to tell the user why they can't compare to expressions that don't have the same type, for example. Let me know if you had other cases in mind that this approach wouldn't satisfy.

I think overloading these as operators would be a nice additions for other DSL's too, but I'm not convinced these are so common to warran a macro system.

I know there are many complaints about Foundation's chunkiness in terms of binary size, so we should definitely preserve fast-to-compile APIs.

The reason I'm against macros is because of their complexity. On the other hand, operators provide the same expressive power (excluding conditionals and type-casting), and are a well recognized, easy-to-understand feature. Macros also tend to be harder to debug, while the operator directly produces a type you can inspect. So given all their issues, I don't think macros are worth it just to be able to write "Jeremy" instead of .value("Jeremy").

Jon_Shier · December 15, 2022, 1:44am

I may have more thoughts later but I have one bit of feedback and a couple questions now.

First, the fact that this proposal uses macros while that feature isn't even under review but just a pitch makes it clear that the eventual review of that proposal will be a formality. I'd ask, once again, for the evolution process to be amended to allow Apple to clearly label pitches and proposals that will be accepted as such so the community can set expectations accordingly.

My question is, what are the performance goals of this feature? Obj-C's predicates were rather unpredictable in the performance arena given that they couldn't do many of the things db query optimizers could do, while the syntax practically begged users to treat them like a fully fledged db. So what testing has been done so far, and what kind of performance is expected? What optimizations, if any, can be performed?

Also, how are we going to debug these? Are we able to breakpoint through the predicate as it evaluates? Can we set column breakpoints to see the intermediate results of the query?

Oh, and for the pitch itself, I'd suggest naming all of your closure properties to make them more readable.

Jon_Shier · December 15, 2022, 1:48am

Oh, and will these APIs be offered from Foundation as extensions to Collection or anything? It would be nicer to use these as if they were instance functions rather than globals. That is

let result = collection.applying(predicate: #predicate(...)) // Not sure what the verb would be here.

would feel much nicer than

let predicate = #predicate(...)
let result = predicate.evaluate(collection)

If that sort of functionality isn't offered it seems like something that will be added pretty frequently, for predicate users at least. Best to nip that in the bud right away.

I'm also not sure this should be in Foundation, but...

hborla · December 15, 2022, 5:35am

Operators are resolved like global functions, so any new version of + or == or other standard operator is considered an overload, even if they operate on completely new types. That's the real pitfall of global operators -- adding new operators over types that are not used anywhere can still break existing code with type checker timeouts.

The problem with diagnostics in the operator overloading approach is specifically that the error messages will mention types that are specific to the predicate representation, which the programmer doesn't really care about. As the expression gets more complex, so does the type that attempts to encode the structure of the expression tree. You'd end up getting error messages that mention highly nested generic types, such as PredicateExpressions.Equal<PredicateExpressions.Value<String>, PredicateExpressions.Variable<Int>>, and the programmer has to mentally parse this giant type to figure out the types that they wrote that didn't match were String and Int.

Operators don't provide all of the same expressivity as macros. For example, a macro allows you to represent syntax in an expression tree that cannot be overloaded by operators, such as coercions and the ternary operator. Custom operators also rely on overload resolution and bidirectional type inference, which are incredibly difficult features to understand. With the compile-time performance and diagnostics implications, I personally don't think there's any usability win for the operator approach over macros.

To my mind, the beauty of the macro approach here is that the code in a #predicate closure will type check against the existing operator overloads that programmers are already familiar with, not against custom operators over predicate-specific types. So, the semantics of a predicate expression use concepts that the programmer is already familiar with. If and when the programmer wants to understand how predicates are represented, e.g. to write custom operations, only then do they need to dig deeper into the PredicateExpression representation.

A macro expansion may also have an opportunity to produce more actionable diagnostics that are domain-specific, which is something that has been a frequent pain point of using library-defined DSLs with standard Swift type checker diagnostics; the @available trick is not powerful enough to identify many common mistakes. If we go down the route of semantic macros, a macro will have more than enough information to produce errors about API misuse, possibly even with custom fix-its that are specific to the library. This is something that I would love to see explored more as part of the macro evolution discussions!

dgoldsmith · December 16, 2022, 2:01am

There is already one extension to Sequence in the pitch, pretty much what you suggest:

extension Sequence {
    public func filter(_ predicate: Predicate<Element>) throws -> [Element]
}

We can add others as needed.