(quoting out of order)
Yes, for example: in Expr.h
If I'm understanding the question correctly, probably overloads, but we're hoping variadic generics can help with the insanity there.
I do share some concerns about using tuples for anything serious in Swift. I feel they make sense intuitively and on first impression for capures (and labels for named captures), but it's entirely possible we'll run into a brick wall at some point. Hopefully, this will help motivate variadic generics, improvements to tuples, and any story for easily making tuple-like structs.
Could you share more? For the literals, the underlying type is always Substring (the slice type) and we're adding on whether it's optional or an aggregate. We're not trying to turn it into an Int by implicitly invoking an initializer, for instance.
"Building generic functionality" is a big topic and not strictly required as part of this initial push, but I think we'd regret passing up the opportunity to establish a good technical foundation for generic functionality. I'm not sure this is exactly the info you're asking about, but I'll use your question as an excuse for an info dump on one narrow "slice" of this topic :-)
(Code examples, protocols, declarations, etc., below are not being formally proposed. They help highlight the domain of what is possible, but are intentionally presented with semantic clarity over code prettiness.)
Protocol powered matching algorithms
The overview talks about enabling collection algorithms (CC @timv) and the bare minimum would be concrete overloads taking Regex or Pattern or something similar. But, we're likely to generalize matching via protocols. The core interface for non-capturing (i.e. doesn't produce a value) could be consume below, and a value-producing interface could be match below. Exactly how this is formulated, what the protocol heirarchy is, when types get bound, etc., is all TBD. Here's a basic sketch:
protocol CollectionConsumer {
associatedtype Input: Collection
associatedtype Position = Input.Index
func consume(_: Input, startingFrom: Position) -> Position?
}
protocol CollectionMatcher: CollectionConsumer {
associatedtype Value
func match(_: Input, startingFrom: Position) -> (Value, resumeFrom: Position)?
}
extension CollectionMatcher {
func consume(_ c: Input, startingFrom idx: Position) -> Position? {
match(c, startingFrom: idx)?.resumeFrom
}
}
extension CollectionMatcher where Value == () {
func match(_ c: Input, startingFrom idx: Position) -> (Value, resumeFrom: Position)?
guard r = consume(c, startingFrom: idx) else { return nil }
return ((), r)
}
}
The generalization of a look-around assertion is just a special case of a consumer (e.g. consume(...) != nil).
Regex/Pattern would conform, of course, but so could Int (nibble an integer off), Foundation's date format styles (nibble a date off), or 3rd party types (arbitrary nibbling).
We are talking about being able to interleave library calls as part of matching, so we need some interface to express this. The consume/match interface above could also be the interface for composition with library calls. That means any conformers could be called (mechanism TBD) within a Pattern result builder. Effectively, conformers are sub-patterns, so long as all the types line up (more on that later).
Conforming to this protocol does require providing a binding for Input (which could be a generic parameter). This could be a problem for types who, for whatever reason, cannot (many-to-many type relationships, working around HKTs, etc., are TBD).
Similarly, the generalization of a character class is a function (Element) -> Bool.
protocol CharacterClass {
associatedtype Element // Commonly `Equatable` or `Hashable`, but doesn't need to be.
func contains(_: Element) -> Bool
}
Character classes don't need to bind Input, just their Element. Any character class could be used as a consumer (by binding Input) through the use of subscript and index(after:), mechanism TBD.
You could even imagine regex literals applying to any collection for which there is a bijection between Element and Character.
protocol Bijection {
associatedtype From
associatedtype To
func mapFrom(_: From) -> To
func mapTo(_: To) -> From
}
protocol RegexApplicableCollection: Collection, Bijection
where To == Element, From: StringProtocol
{
// ...
}
And you could even imagine all kinds of extra stuff built on top of the notion of a CollectionMatcher whose Value type is round-trippable through String (e.g. pretty printers). It goes on and on (provided the many-to-many or HKT issues don't get in the way...).
An alternative generalization of matching could be:
protocol MatchProcessor {
func run(_: inout MatchingEngine)
}
We've traded being bound to Input for an API on MatchingEngine, which is TBD. A processor could query aspects of matching state, potentially fetch the collection and current position (mechanism TBD), and interact or drive pattern matching (signal failures, push/pop from a backtracking stack, etc). This would of course mean designing a stable API/ABI for MatchingEngine, potentially exposing implementation details. But, this kind of approach works well for extending pattern matching over non-collections, such as asynchronous streams of ephemeral content.
Multi-stage type binding
You could imagine a subset of matching functionality that doesn't need to bind any types. For example, /..*/ could match the entirety of any non-empty collection. For such a matcher to conform to this interface, we could have:
struct MatchNonEmpty<Input: Collection>: CollectionMatcher {
typealias Value = Int
func consume(_ c: Input, startingFrom start: Position) {
start == c.endIndex ? nil : c.endIndex
}
func match(_ c: Input, startingFrom start: Position) -> (Value, resumeFrom: Position)?
let end = c.endIndex
guard start != end else { return nil }
return (c.distance(from: start, to: end), resumeFrom: end)
}
}
A generic parameter Input allows this pattern to apply to any collection of any element type, because this matcher doesn't care. We pretty much have to parameterize though, as a type can conform to a protocol in only one way.
Semantically, regexes are like matchers generic over Input: Collection where Element == Character (albeit with extra weirdness because String is weird, more on that later). They have similar semantics when applied to String as Array<Character>, though we'd of course want to fast-path String's implementation to exploit its UTF-8 storage. Regexes have bound the Element type to Character, that is they are post-Element-binding time.
Even after regexes are "compiled" (for some TBD notion of compilation) or even linked/loaded (TBD), they don't necessarily have to be restricted to a particular kind of collection. There is a later point in time in which the collection type gets bound, a later point in time when a regex gets associated with a particular instance of that collection type, and a later point in time when it is ran on that instance. Of course, these could all be the same instant in time myString.split(/\s+/), but each provides more information for compilation/specialization.
On the other hand, whenever we bind a type, we are either dropping generality or limited by what can be accomplished through parameterization. Something that only binds Element is perfectly applicable to Sequence and AsyncSequence in addition to Collection. Something that works on a moving window over asynchronously delivered chunked data might bind its collection to be the buffer. But, it would additionally need the means to interact with that window (e.g. peek/fill the buffer, eat/drop processed portions) and would need the means to restrict look-around/backtracking to the window.
If Patterns are composed using the CollectionConsumer interface above, that means that Patterns will have to deal with binding or parameterizing-away the Input type. This might end up forcing some many-to-many or HKT issues (hic sunt dracones).