[Pitch] Swift Predicates

Muescha · December 17, 2022, 8:41pm

let predicate = Predicate<Message> {} //<--- better
let predicate = #predicate<Message> {}

I think the first line looks better for me.

Should the design of a language and features really depends on possible compile time issues?

I think compilers evolve and compile time can be optimized under the hood in the feature.

Geordie_J · December 19, 2022, 12:35am

My biggest concern is that this has any relation to Foundation. I’m not familiar with FoundationEssentials but I would greatly appreciate any new proposals and code being as far away from legacy monolithic constructs like Foundation as possible.

jmschonfeld · December 19, 2022, 4:19pm

FoundationEssentials is a new effort regarding the open source version of Foundation's Swift implementations. You can read more about this here if you're curious about the details, but this effort does involve breaking up the Foundation module into various smaller components to avoid the monolithic structure we currently have. Our idea is for the predicate APIs to land in the core/smaller FoundationEssentials package, but other Foundation APIs related to XML, networking, internationalization, etc. will be broken out into separate packages.

esummers · January 25, 2023, 1:33pm

There is a typo in one of the examples. An extra quote is in the middle of this string.

NSPredicate(format: "SUBQUERY(recipients, $recipient, recipient.firstName == sender.firstName").@count > 0")

jmschonfeld · January 27, 2023, 8:07pm

Hi all, thank you everyone for your feedback so far! We've made some small updates to the pitch, and I've updated the pitch document linked in this post (the document here for reference). The main changes include:

Adding the full definitions of each expression operator
Renaming #predicate to #Predicate to align with capitalized names typically used for invoking a type's initializer
Substituting a new build_KeyPath function invoked by the macro for previous uses of dynamic member lookup

We appreciate all of your input, and we'd love any feedback you may have with these new revisions. Feel free to let us know if you have any comments or questions!

Jon_Shier · January 27, 2023, 9:14pm

I don't think we've established macro naming as part of the guidelines, but this seems to be an odd choice to me. If it starts with # it's a macro, whether or not that macro initializes a type under the hood. I'd expect macros to be lower cased. If you want to initialize a type, can't you make a normal Predicate type that takes the macro closure in one of the initializers? Frankly I'd prefer that rather than seeing # hanging around so much.

mpangburn · January 27, 2023, 9:17pm

This is very interesting; having an example of multiple cutting-edge language features (variadic generics, macros) working in unison to tighten type-safety is compelling.

I'm curious how the set of operations supported by the Predicate macro system will be understood by developers. I see a couple avenues where this may arise:

Diagnostics, when a developer attempts to use an operation not supported by the macro transform;
Autocomplete support when typing a macro, as stated in goal (2) of the proposal's Motivation section.

For example, suppose a developer is interested in achieving the semantics of this example from the proposal:

let predicate = #Predicate<Message> { message in
    message.recipients.filter {
        $0.firstName == message.sender.firstName
    }.count > 0
}

but, seeing what looks like freely-written Swift in the provided closure, instead attempts to write:

let predicate = #Predicate<Message> { message in
    message.recipients.map(\.firstName).filter {
        $0 == message.sender.firstName
    }.count > 0
}

where map is an example of an unsupported operator (and presumably this fails to compile).

Can diagnostics & autocomplete inform the developer that e.g. the only supported operations on sequences are filter(_:) , contains(_:), contains(where:), allSatisfy(_:), etc.?

If supported in autocomplete, how is that relationship between macros and the IDE communicated? I'm guessing the macro definition itself isn't enough, but if that relationship is described in one of the macro proposals and I've missed it, my apologies.

(I recognize some developer experience-type questions may be outside the scope of this pitch, but thought I'd ask!)

jmschonfeld · January 27, 2023, 9:52pm

You're right, we haven't quite established naming as part of the guidelines (tagging @Douglas_Gregor since we've briefly discussed this). I don't think dropping the # altogether is a direction we want to go towards. While the semantics of the pre- and post-expansion code are the same, there's quite a bit of heavy lifting going on in the macro here that we'd like to be clearly evident to the developer by writing #Predicate at the construction site rather than hiding the macro invocation in the declaration of the initializer. Given the choice between something like #predicate and #Predicate, we felt that #Predicate looked more natural to indicate that the macro initializes a type, rather than something like #assert which acts more like a function call. Doug might have some more thoughts here.

jmschonfeld · January 27, 2023, 10:00pm

Potentially a bit out of scope for the pitch here, but still an important question nonetheless! Currently diagnostics are our main tool here, and in fact this is one of the compelling reasons why we'd like to use macros instead of a solution like operator overloading. The macro will produce diagnostics for this case that tell the developer that the function is not able to be used in the context of the predicate they are creating. For common functions, we also have the opportunity to provide fix-its or suggestions as applicable within these diagnostics, but in general the diagnostic will just alert the developer that this function can't be used. Currently, our main source of truth for developers to see a list of supported operations would be the documentation (for example, predicates support all expressions that conform to StandardPredicateExpression). Macros don't currently have a way to influence the autocomplete results, but that is an avenue that could be interesting to bring up on the macro proposal.

Jon_Shier · January 27, 2023, 10:05pm

This is a much larger discussion that doesn't really need to be here, so I've threaded it.

Macro Naming

One simple objection is, why is a macro that returns a type (or otherwise looks like an init) different than a global or other function that does the same thing? If we had a global `predicate` factory, by this logic shouldn't it be `Predicate` (barring the inevitable collision with the real type)? Also, given that result builder closures are unmarked (despite feedback in the review, IIRC), why wouldn't we allow or expect macro closures to also be unmarked? If we're really that's concerned, adding a `#{}` form for macros (where the actual `#predicate` type is inferred) seems logical so we can have `Predicate {}` and `Predicate #{}` rather than `Predicate {}` and `#Predicate {}`. But then I may be in the minority that really doesn't want yet another marker / syntax for macros.

hassila · January 30, 2023, 4:25pm

This is an interesting pitch which provides a lot of useful functionality!

So I have one fundamental question - what would be the approach for allowing for a UI that edits a representation of a predicate and transforms it back to the serialised entity that can be sent over to another process?

Specifically, we'd like to use predicates as a generic filter mechanism, where e.g. a frontend can edit a predicate (including values that are used as part of the expressions) and get a serialised representation that can be sent over the network to another process that then applies it (either directly or using the described support for transform to e.g. SQL).

It seems the needed mechanisms are halfway there ("Tree walking" section), but it is not (yet) quite clear to me if this would be flexible enough of a hook to go from serialised predicate -> UI. The reverse (going from some flexible/dynamic internal representation that allows a human being to interact with it, rather than coding stuff, over to the Predicate format) seems out of scope currently (but would be a critical piece to make this deeply more useful for many use cases).

Perhaps I'm missing something, but would be interested to hear your ideas here.

dgoldsmith · January 30, 2023, 6:25pm

What do you see as the obstacles to doing what you propose? I think it's an entirely reasonable goal we want to support.

hassila · January 31, 2023, 2:20pm

Not sure there are obstacles, I'm just trying to understand how do to a few things based of the pitch description (it's harder when you can't play with it ) - maybe two questions to begin with:

To confirm: to programatically (dynamically at runtime) generate a custom predicate tree, we'd use the low-level building blocks as outlined in the section "Macro processing" (that shows how the macros are expanded) ?
In the 'tree walking' section, it is outlined how to add custom predicate processing - but how do you actually trigger a tree walk of myPredicate - is it by simply calling evaluate ? In that case, will a full evaluation always be done of all nodes in the tree, or can logic be short circuited before all nodes are traversed in the tree?

dgoldsmith · January 31, 2023, 7:31pm

Yes, you'd construct the tree using the public initializers on the various PredicateExpression types.
As the tree walking section shows, this is typically done by defining a protocol, and then conforming Predicate and all the PredicateExpression types you want to handle to that protocol via extensions. If the protocol is (e.g.) ParseToResult like so:

protocol ParseToResult {
    func parse() -> Result
}

after writing extensions to conform Predicate et. al. to ParseToResult, you'd then do this:

let myResult = aPredicate.parse()

Note that the parse function could have an argument that would act as global state that could be passed down to component PredicateExpression values. While the example works if you have your own Predicate type, to do it for standard Predicate you need to cast:

extension Predicate: ParseToResult
    func parse() -> Result {
        return (expression as! ParseToResult).parse()
    }
}

If you wanted to allow for the case where you don't handle every PredicateExpression type, you could make parse() return an optional and use as? instead.

evaluate() is not involved in tree walking. You only use that if you wish to supply input to the predicate and evaluate it.

hassila · February 1, 2023, 7:55am

Thanks for the clarification @dgoldsmith (we missed parse() somehow, it was the missing piece for us - EDIT: in fact, looking at the pitch, I can't find it, perhaps it can be added to the Tree Walking section as an example? It would be clearer than spotlightQuery at least for us) - we've discussed the pitch with our team internally and overall we think it would be a great addition and definitely would make heavy use of this, looks really promising.

Our only remaining major concern (which obviously is hard to know from a pitch, but please view this as an open question) would as mentioned be performance (of evaluate() - specifically the use of keypaths, as even though SE-061 have the following note on performance:

The performance of interacting with a property/subscript via KeyPaths should be close to the cost of calling the property directly.

There are a few references so far to (quite significant) performance issues with the current keypath implementation, e.g.:

github.com/apple/swift

[SR-9323] KeyPaths are quite slow

opened 01:08PM - 22 Nov 18 UTC

weissi

bug performance compiler

| | | |------------------|-----------------|… |Previous ID | SR-9323 | |Radar | rdar://problem/52529589 | |Original Reporter | @weissi | |Type | Bug | Attachment: [Download](https://user-images.githubusercontent.com/2727770/164963044-daaa6692-c66d-47a3-b822-a02930633670.gz) <details> <summary>Additional Detail from JIRA</summary> | | | |------------------|-----------------| |Votes | 14 | |Component/s | Compiler | |Labels | Bug, Performance | |Assignee | None | |Priority | Medium | md5: 8a7995ec006b266bd138d8a19ef4ebed </details> **is duplicated by**: * [SR-11983](https://bugs.swift.org/browse/SR-11983) KeyPath performance below expectation compared to alternative (see test) **Issue Description:** ## Description Naively I always assumed KeyPaths are quite fast. In my head they were basically a tuple of two function pointers (a getter and a setter, both not capturing so `@convention(thin)` like) that get just handed around and applied. At least I assumed that would be some sort of fast path when it has enough information at compile-time. To see how fast/slow keypaths are, I made a quick benchmark which just incremented a struct member (`Int`) 100 M times. To have an idea what the theoretical maximum is, I compared that to a version that doesn't use key paths at all and just does \`thing.x += 1\` in a loop. (I checked the assembly and the compiler does spit out every single increment (for overflow checking) it does however unroll the loop five times). Anyway, the result is: time taken for 100M direct increments (in s) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.02154 0.02355 0.02358 0.02613 0.02450 0.03828 Now I benchmarked that against KeyPaths ``` java var thing = SomeStruct() for _ in 0..<100_000_000 { thing[keyPath: \SomeStruct.x] += 1 } ``` and the result is: time taken for 100M keypath increments (in s) Min. 1st Qu. Median Mean 3rd Qu. Max. 4.691 4.698 4.722 4.738 4.778 4.821 which is 200x the runtime of the original one. I used `Apple Swift version 5.0-dev (LLVM cbe8d5e28f, Clang 3452631569, Swift 201dcba300)`, before that it was even slower. Then I tried to understand why Keypaths are so slow and I created yet another benchmark which goes through a pretty naive approximation: ``` java public struct SomeStruct { public var x: Int = 0 public init(){} } public struct FakeWritableKeyPath<Thing, Element> { public let writeIt: (inout Thing, Element) -> Void public let readIt: (Thing) -> Element } extension SomeStruct { public static let fakeKeyPathsForX: FakeWritableKeyPath<SomeStruct, Int> = FakeWritableKeyPath(writeIt: { thing, newValue in thing.x = newValue }, readIt: { thing in return thing.x }) } ``` and the loop was ``` java for _ in 0..<100_000_000 { let read = SomeStruct.fakeKeyPathsForX.readIt let write = SomeStruct.fakeKeyPathsForX.writeIt write(&thing, read(thing) + 1) } ``` to my absolute surprise, that yielded better performance ("only" 47x slower): Min. 1st Qu. Median Mean 3rd Qu. Max. 1.073 1.091 1.103 1.116 1.131 1.217 To finish off, I benchmarked against what I thought would kind of approximate the implementation at least in a fast path (just handing two function pointers around): ``` java public struct FakeCheatedWritableKeyPath { public let writeIt: @convention(thin) (inout SomeStruct, Int) -> Void public let readIt: @convention(thin) (SomeStruct) -> Int } extension SomeStruct { public static let fakeCheatedKeyPathsForX: FakeCheatedWritableKeyPath = FakeCheatedWritableKeyPath(writeIt: { thing, newValue in thing.x = newValue }, readIt: { thing in return thing.x }) } ``` with the loop just like above. That started to yield reasonable performance Min. 1st Qu. Median Mean 3rd Qu. Max. 0.2298 0.2329 0.2351 0.2362 0.2401 0.2440 which is only about 10x slower than the direct additions and I think that's reasonable because `INC` is a processor instruction which naturally is a bit faster than 'function call to read the value, increment, function call to write the value'. Also loop unrolling etc... ## Notes ### Compiler Apple Swift version 5.0-dev (LLVM cbe8d5e28f, Clang 3452631569, Swift 201dcba300) Target: x86_64-apple-darwin18.2.0 ### OS macOS 10.14 on Model Identifier: MacBookPro15,1 Processor Name: Intel Core i9 Processor Speed: 2.9 GHz ### Observations I found {{ %5 = keypath $WritableKeyPath\<SomeStruct, Int\>, (root $SomeStruct; stored_property \#SomeStruct.x : $Int) // users: %20, %8}} in the SIL which looks like the compiler has actually quite some understanding of key paths, so maybe there's hope they will soon be faster? 😉 ### Code the structs/fake key paths were defined in a module `Foo` and all the calls were always from another module `TestApp` in order not to get any inlining effects. But even with everything in one module, the slow versions didn't get faster at all. the whole code is attached (the .tar.gz and can be run with just `swift run -c release`), but is also here (note that everything below `// MODULE: Foo` is in another module. ``` java import Foundation import Foo public func measure(_ fn: () throws -> Int) rethrows -> [TimeInterval] { func measureOne(_ fn: () throws -> Int) rethrows -> (TimeInterval, Int) { let start = Date() let v = try fn() let end = Date() return (end.timeIntervalSince(start), v) } let firstRes = try measureOne(fn).1 /* pre-heat and throw away */ var measurements = Array(repeating: 0.0, count: 10) for i in 0..<10 { let timeAndRes = try measureOne(fn) measurements[i] = timeAndRes.0 precondition(firstRes == timeAndRes.1) } print(firstRes) return measurements } public func measureAndPrint(desc: String, fn: () throws -> Int) rethrows -> Void { print("measuring: \(desc): ", terminator: "") let measurements = try measure(fn) print(measurements.reduce("") { $0 + "\($1), " }) } measureAndPrint(desc: "direct") { var thing = SomeStruct() for _ in 0..<100_000_000 { thing.x += 1 } return thing.x } measureAndPrint(desc: "fake key paths") { var thing = SomeStruct() for _ in 0..<100_000_000 { let read = SomeStruct.fakeKeyPathsForX.readIt let write = SomeStruct.fakeKeyPathsForX.writeIt write(&thing, read(thing) + 1) } return thing.x } measureAndPrint(desc: "totally cheated fake key paths") { var thing = SomeStruct() for _ in 0..<100_000_000 { let read = SomeStruct.fakeCheatedKeyPathsForX.readIt let write = SomeStruct.fakeCheatedKeyPathsForX.writeIt write(&thing, read(thing) + 1) } return thing.x } measureAndPrint(desc: "real key paths") { var thing = SomeStruct() for _ in 0..<100_000_000 { thing[keyPath: \SomeStruct.x] += 1 } return thing.x } // MODULE: Foo public struct SomeStruct { public var x: Int = 0 public init(){} } // compiler generated // fake public struct FakeWritableKeyPath<Thing, Element> { public let writeIt: (inout Thing, Element) -> Void public let readIt: (Thing) -> Element } extension SomeStruct { public static let fakeKeyPathsForX: FakeWritableKeyPath<SomeStruct, Int> = FakeWritableKeyPath(writeIt: { thing, newValue in thing.x = newValue }, readIt: { thing in return thing.x }) } // cheat public struct FakeCheatedWritableKeyPath { public let writeIt: @convention(thin) (inout SomeStruct, Int) -> Void public let readIt: @convention(thin) (SomeStruct) -> Int } extension SomeStruct { public static let fakeCheatedKeyPathsForX: FakeCheatedWritableKeyPath = FakeCheatedWritableKeyPath(writeIt: { thing, newValue in thing.x = newValue }, readIt: { thing in return thing.x }) } ```

and

I understand the optimization of keypath handling is handled by a different set of priorities, I just wanted to point it out if there is anything that can be done to minimize that possible impact.

I guess it's only tangential to the pitch, but wanted to at least call it out as performance of evaluate() is critical to the usability of Predicates (at least for us) - and the pitch overall really looks great and we'd be super happy to use it (as long as performance is ok).

So, anyway - big +1 for the pitch overall.

filip-sakel · February 1, 2023, 11:59pm

Thanks for the explanation! I played around with expression macros and (although I got quite a few errors and couldn’t run anything) I understand why they’re used in the pitch. My only concern of macros in general, which does extend to this pitch, is the risk duplicating common functionality in different projects. For example, even after the advent of macros, property wrappers are still a great way of adding behavior to properties in a way most Swift programmers understand. I think this is the case with the proposal’s macros aa they are mainly used for custom operators. However, this behavior is not unique to predicates, the power assert discussed in the expression-macro threads is another great example and I imagine scoped operators being used for numerical computing too. In other words, you made a great case for why predicates would benefit from scoped operators instead of overloading, but we should generalize this feature to extend beyond Foundation. Otherwise the Swift ecosystem will become fragmented with each library author choosing their own version of scoped operators. The following is a simple, generic design we could use for this feature:

macro Predicate<R>(body: () -> R) = #scopedOperators(
  OperatorDescriptor(
    infix: “+”, 
    implementation: PredicateExpressions.Equal.init
  ),
  body
)

dnadoba · February 2, 2023, 11:17am

Quite an interesting idea. A similar idea, but for result builders, was recently pitched here:

Maybe a more general feature could solve both problems. I guess namespace and automatic usages of namespaces could be a nice thing but of course a lot of work to design and implement. I imagine this would also make autocomplete work more seamless.

filip-sakel · February 2, 2023, 11:50am

I actually hadn’t considered a unified result-builder and operator namespacing feature; it’s a great idea!

The design would definitely be time consuming. For one, there’s the question of whether operators outside the namespace are just prioritized over global ones, or completely prohibited (e.g. there’s no bitshift operator in SwiftPredicates). However, the implementation, at least for the prototype in my previous post, was actually quite simple. You just parse the operators given to the macro, visit all operator nodes and substitute with the correct implementation.

jmschonfeld · March 14, 2023, 11:15pm

Hi all, as mentioned in this pitch, we've posted a followup pitch regarding the serialization behavior of predicates at [Pitch] Swift Predicates: Archiving.

stevapple · June 13, 2023, 8:49am

I'm glad to see this proposal went into live with Xcode 15 Beta, but I noticed that the variadic generic APIs are not implemented. In fact the variadic generic semantics are not fully described in the proposal, but it did mention an example of the macro and listed an entry in detailed designs.

So my question is: Is the variadic generic version of Predicate and #Predicate still planned? How soon can we use it on Darwin & how soon can we see it in swift-foundation?