[Pitch] Simplify Substring Searching with Custom Operators and Predicates

Hello Swift Community,

I would like to propose a new feature to the Swift standard library that simplifies substring searching in Swift. The goal is to make it easy to search for multiple substrings within a given text, providing a more concise and expressive way to represent complex search patterns.

Motivation

In many scenarios, developers need to search for multiple substrings within a larger string. Swift's built-in contains method is great for checking if a single substring exists within a string, but it can become cumbersome and unwieldy when searching for multiple substrings.

Consider the following example:

let text = "The quick brown fox jumps over the lazy dog."

let containsQuickOrJumps = text.contains("quick") || text.contains("jumps")
let containsFoxAndDog = text.contains("fox") && text.contains("dog")

This approach can quickly become complex and difficult to maintain when checking for a large number of substrings. As a result, it is essential to find a more efficient and expressive way to represent complex search patterns in Swift.

Proposed Solution

The proposed solution is to introduce custom infix operators and predicates to the Swift standard library, which can be used to create complex and flexible search patterns in a more readable manner.

Here's an example of how this new feature can be used to search for substrings in a given text:

let text = "The quick brown fox jumps over the lazy dog."

// Check if text contains "quick" OR "jumps"
let containsQuickOrJumps = text.contains("quick" || "jumps")
// Check if text contains "fox" AND "dog"
let containsFoxAndDog = text.contains("fox" && "dog")
// Check if text contains "fox" AND ("jumps" OR "swift")
let containsFoxAndJumpsOrSwift = text.contains("fox" && ("jumps" || "swift"))

By introducing custom infix operators and predicates for substring searching, the search patterns become more readable, expressive, and easier to understand. This new feature simplifies the process of searching for multiple substrings in Swift, making the code more maintainable and less prone to bugs.

Request for Feedback

I would love to hear your thoughts and feedback on this proposal. If you have any suggestions or concerns, please feel free to share them in this thread. I'm particularly interested in hearing about:

  • The overall utility and usefulness of this feature.
  • Potential issues or concerns related to source compatibility, ABI stability, or API resilience.
  • Alternative approaches or improvements to the proposed solution.

Thank you for your time and consideration!

Link to the full proposal: SE-NNNN
Link to the repository with Implementation.

6 Likes

Cool idea. However, I hope am not imagining this, but I remember seeing discussions about supporting regular expressions.

1 Like

Yes, that would definitely be a great feature to have. In combination with the existing logical OR (|| ) and AND (&& ) operators, the regular expression support provides a powerful way to create complex search predicates.

I opened a new pull request (Add regex string predicate with prefix =~ by Tavernari · Pull Request #5 · Tavernari/StringContainsOperators · GitHub) to add support for this feature.

To take advantage of this new feature, we can use the prefix operator =~ to create a StringPredicate that represents a regular expression pattern. Here's an example:

"The quick brown fox".contains(=~ "^The.*$") // true

Overall, this is a great addition to the StringContainsOperators library, and I'm excited to see it in action. Thanks for the comment, and keep it up!

1 Like

Interesting idea. I take it that:

text.contains("fox" && !"dog")

could mean "text contains "fox" but doesn't contain "dog".

I'm not sure about "~" though, contains(..., options: [.caseInsensitive, .diacriticInsensitive]) is more verbose yet at the same time more obvious.


could/should this be extended for other API's as well? like replacingOccurrences(of:with:) and range(of: ...), perhaps a few others?

1 Like

Thanks for your interest and feedback, @tera! Yes, that's correct. Using the "&&" operator with the "!" prefix allows you to exclude a specific substring from the search, making the search more flexible and powerful.

Regarding the use of the "~" prefix for case-insensitive and diacritic-insensitive search, I agree that it may not be immediately obvious to everyone. However, we chose this operator to keep the syntax concise and intuitive. That being said, we're open to suggestions and would love to hear your thoughts on this.

As for extending this feature to other APIs such as replacingOccurrences(of:with:) and range(of: ...), it's definitely something worth considering, and we would welcome any contributions to help make this a reality. Thank you again for your feedback, and please feel free to contribute to the project if you're interested!

Hello @tera and everyone on this thread,

I wanted to update you that I just added a new pull request at Introducing negatable operator ! for StringContainsOperators by Tavernari · Pull Request #6 · Tavernari/StringContainsOperators · GitHub to add a new negatable operator predicate to the project. This new feature will allow developers to negate a predicate using the ! operator, making it easier to create more complex and precise search criteria.

Thank you all for your input and feedback. I believe that this new feature will make the project more useful and powerful for developers everywhere.

This for me raises several questions that I would expect to be addressed within such a proposal:

  • What is the justification for making syntax that in all other contexts has a established language syntax meaning (implying that the String arguments are convertible to Bool and that the boolean expression will be reduced to a single boolean function argument) and introducing an exception to the established language syntax for this use case?
  • If you can justify that, then what is special about String contains in this regard? If this approach is warranted, why special case String contains rather than extend this design to all single-parameter functions with boolean return type?
  • That will raise the obvious potential for existing valid source breakage cases where the existing function parameter is Bool or convertible from Bool, as well as ambiguity in cases where potential function arguments are convertible to Bool. Where then are the lines around this design proposal drawn, and what are the justifications? Would any of these rules improve or reduce readability and the ability to reason about code?
2 Likes

Thank you for your thoughtful questions and concerns regarding this proposal. I appreciate the opportunity to address them and provide further clarification.

  1. The primary justification for using established syntax (|| and &&) in this context is to provide a more familiar and expressive way to represent complex search patterns. The proposed solution leverages the well-known logical operators to create a more intuitive and readable way to perform substring searches. While it's true that these operators typically imply Bool operands, this proposal aims to make an exception for this specific use case, as it enhances the readability and maintainability of the code.
  2. The String contains method was chosen as the focus of this proposal because substring searching is a common task in many applications, and the existing syntax can become cumbersome and challenging to maintain when searching for multiple substrings. While it's possible to extend this design to all single-parameter functions with boolean return types, the proposal's scope has been deliberately limited to String contains to minimize the impact on existing code and avoid potential ambiguity in cases where function arguments are convertible to Bool.
  3. In terms of potential source breakage and ambiguity, the proposed solution aims to strike a balance between enhancing readability and expressiveness while minimizing the impact on existing code. By focusing on the String contains method and utilizing custom infix operators and predicates, the proposal introduces new functionality without modifying existing language syntax or affecting ABI stability. This approach helps ensure that the proposal is source-compatible and minimizes the potential for conflicts with existing code.

To conclude, the primary goal of this proposal is to simplify substring searching in Swift by providing a more expressive and maintainable way to represent complex search patterns. While it does involve some deviations from established language syntax, these changes are carefully considered to minimize potential issues and provide clear benefits in terms of readability and code maintainability.

I agree with Sophia; it doesn't seem like a good idea to do something as basic as overloading the meaning of && and || just for the benefit of one or two APIs. It would be better to add contains(anyOf:) or contains(allOf:) if that kind of combination is common.

This doesn't work the way you think it does: you're adding overloads that need to be considered for all uses of the operator, not just uses that happen to be within the argument of contains. It can absolutely have source compatibility (and build time) impact on other code.

5 Likes

I actually recognize your idea from my misspent youth on perl6-language! Your proposal is very like a feature from Perl 6/Raku called junctions; the main difference is that you've restricted it to certain String APIs, whereas Raku applied it to all types and comparison operators. (Raku also used a distinct set of operators: infix &, |, and ^ created junctions, while their doubled equivalents performed short-circuiting boolean logic as usual.)

I was a big fan of this feature in Raku, but I'm not sure it's a good fit for Swift. In Raku's unrestricted form, it created a lot of implicit loops, sometimes making it difficult to figure out the time complexity of a piece of code. Your version would probably avoid the worst of that, but at the cost of making the feature extremely limited. Overloading the operators would also create typechecking performance concerns, as John mentioned.

Nevertheless, it seems to me that your proposal doesn't include anything that truly requires additional language or standard library support. Why not distribute it as a Swift package and see if people find it useful? That would allow you to evolve the design so it covers more use cases over time; if it became very popular, the mature design could even be adopted into the standard library at that point.

8 Likes

Hello @beccadax , @John_McCall , and fellow contributors,

I appreciate your valuable feedback and insights on this proposal. I understand the concerns you've raised about the potential performance and type checking implications of the initial idea, as well as the limitations of restricting the feature to certain String APIs.

Becca, your comparison to Raku's junctions is interesting, and while I still believe that this feature could bring value to Swift, I understand and respect your points regarding its potential limitations and challenges. I agree that distributing this as a Swift package could be a more suitable approach, as it allows the community to experiment with and evolve the design without affecting the core language.

Based on your suggestions, I have already created a Swift package called StringContainsOperators available at this repository: StringContainsOperators. I would like to invite all of you to visit the repository, try it out, and contribute any ideas or improvements you may have. Your continued feedback and expertise would be invaluable in refining and expanding this concept.

By providing this functionality as a separate package, we can explore its potential and usefulness without impacting the broader Swift ecosystem. If it proves valuable and gains traction, we can always revisit the idea of incorporating a mature version of the design into the standard library.

Once again, thank you for your time and constructive feedback. I look forward to seeing any contributions or suggestions you may have for the StringContainsOperators package.

7 Likes

I'm also not sure about introducing new operators inside of parentheses of the contains method. Could be confusing to many developers. But I do like your intention of providing more complex search capabilities within the Swift ecosystem. I have 2 suggestions when tackling such a topic:

  1. More complex String search functionality should support different kinds of fuzzy search algorithms out of the box. This would make this API be more useful than simple "and" and "or" operators which I could also implement by writing a Regex. But it's hard to write fuzzy search algorithms with regexes. So if that was provided alongside some common operators, this would be awesome.
  2. While search is a common operation, I'm not sure if "advanced String search" should be included in the Standard library. But it could be added to the swift-algorithms package, or there could be a dedicated "swift-search" package.
1 Like

This seems to me like it's in competing in the same design space as regular expressions, especially the result-builder based ones.

One thing to note is that a naive implementation of these operators would could impose a large performance penalty.

For example, if you given .contains("aaaaa_a" || "aaaaa_b" || "aaaaa_c") to a naive implementation, it would search for the aaaaa_ prefix up to 3 times. An optimized implementation might build a state machine that traverses a prefix tree, with one shared aaaaa_, and a branch to a, b and c final nodes. That kind of optimization is already built for regex, so it would be nice for this kind of feature to integrate with it.

One thing that's really hard (impossible?) to express with regex is this notion of && (with either option existing anywhere in the text, in any order) and !. It looks to me like supporting this would require a state machine that encodes every possible ordering of where those elements might be found, which would lead to a huge explosion in the state graph. Maybe they should be two totally different systesm after all? :person_shrugging:

2 Likes

If the string getting searched is short that's not an issue. If the string is long then it would make sense to scan it just once (using a non naïve implementation) even if the search terms are totally unrelated like "dog" and "fox".


I wonder though where would I want to use this API. Typically it's the other way around - I need to find a given single search string across several fields to find the relevant search results ("find all people where first name or last name or description or address contain search string"), in which case the API I'd be looking for is more like:

let filteredItems = items.filter { item in
    searchString.contained(in: [it.firstName, it.lastName, it.description, it.address])
}

And if it's an advanced user interface that allows searching a few different strings in a large text I'd expect to add those search criteria in a UI via some "Add Criterion" button, but then it's fully dynamic, and I am not constructing this complex expression statically as a Swift code. And to do it dynamically the API I'd be reaching out for would be more like:

stringExpression.combine(anotherString, operator: op) // op is or, and, etc
...
text.contains(stringExpression)

I agree with @AlexanderM: this seems to be competing with regular expressions, particularly RegexBuilder. Instead of introducing a second declarative, composable syntax for string searching, it seems more appropriate to expand the power of Swift Regex. The “or” operator already has an idiomatic solution:

text.contains {
    ChoiceOf {
        "quick"
        "jumps"
    }
}

Or just text.contains(/quick|jumps/).

Adding contains(allOf:) to the standard library would give us the “and” operator, but I’m not sure if there’s a simple solution to the nested “not”. Maybe a way to negate a Regex in RegexBuilder?

1 Like