Abstract
Herein I propose a very small non-intrusive addition to StringProtocol which unifies existing initializers on String and Substring, allowing for writing generic extensions across both which are currently possible, but require unwieldy and unnecessary encoding/decoding. The short version is: there is no generic way to instantiate an arbitrary StringProtocol from its own SubSequence, despite all conforming types providing the corresponding initializer.
Motivating Example
Let us consider the case of implementing a simple lexing primitive over String and Substring. The following consume method matches a prefix, returns the matched prefix as an independent String, and reassigns self to the remaining characters.
Figure 1
fileprivate extension String {
@discardableResult
mutating func consume(while pred: (Character) -> Bool) -> String? {
let match = prefix { pred($0) }
guard match.endIndex > startIndex else { return nil }
defer { self = String(self.suffix(from: match.endIndex)) }
return String(match)
}
}
fileprivate extension Substring {
@discardableResult
mutating func consume(while pred: (Character) -> Bool) -> String? {
let match = prefix { pred($0) }
guard match.endIndex > startIndex else {
return nil
}
defer { self = self.suffix(from: match.endIndex) }
return String(match)
}
}
It seems natural and ideal to instead implement this over StringProtocol (Figure 2). However, the assignment to self cannot be made generic. More precisely: there is no generic way to instantiate a StringProtocol from its own SubSequence. One roundabout solution is to encode the unicodeScalars of the suffix, and then use a cString/decoding initializer. This adds needless codec overhead, as StringProtocol requires a UnicodeScalarView anyways.
Figure 2
fileprivate extension StringProtocol {
@discardableResult
mutating func consume(while pred: (Character) -> Bool) -> String? {
let match = prefix { pred($0) }
guard match.endIndex > startIndex else { return nil }
// No such initializer.
defer { self = Self.init(content: self.suffix(from: match.endIndex).unicodeScalars) }
return String(match)
}
}
Suggested Resolution
Note however, that both String and Substring expose the following constructors (Figure 3 is a sketch, not intended to accurately depict the implementation of String or Substring):
Figure 3
public struct String {
public init(content: Substring.UnicodeScalarView)
// ...
}
public struct Substring {
public init(content: Substring.UnicodeScalarView)
// ...
}
The above issue can then be remedied very easily by the addition of a single initializer to StringProtocol:
Figure 4
protocol StringProtocol {
// ...
init<SubStr>(content: SubStr.UnicodeScalarView) where SubStr == SubSequence, SubStr: StringProtocol
}
This addition is entirely painless and required no additional implementation. The use of a generic is to avoid modifying StringProtocol, as I do not believe there is a way for StringProtocol to refine the constraints on SubSequence, which it inherits from parent protocols.
This protocol requirement is already satisfied by both String and StringProtocol. By making this change, the sample shown in Figure 2 "just works".
In the mean time, this can be worked around in one's own code like so:
protocol AugmentedStringProtocol: StringProtocol {
init<SubStr>(_ content: SubStr.UnicodeScalarView) where SubStr == SubSequence, SubStr: StringProtocol
}
// Initializers already exist, no need to implement anything.
extension String: AugmentedStringProtocol {}
extension Substring: AugmentedStringProtocol {}
Edit
Pursuant to further discussion, a better alternative might be to add ... where SubSequence: StringProtocol to the declaration of protocol StringProtocol, and declare the above initializer without the generic parameter.