Add isPrependConcatenationMark to Unicode.Scalar.Properties
Hi everyone!
Quick note: this is my first post in the Pitches category so please let me know if there's anything I should change or do differently. Thanks!
Introduction
Swift's Unicode.Scalar.Properties struct includes many useful property checks for scalars, such as isAlphabetic, isHexDigit, etc. I think including a new isPrependConcatenationMark property check would also be useful, corresponding to the Prepend_Concatenation_Mark property listed in the Unicode Standard.
Motivation
When handling grapheme cluster break detection, the grapheme break property table lists the grapheme break properties and their corresponding code points. Some of these values are defined in terms of individual code points, like CR, while others are defined in terms of Unicode properties, like Prepend.
Allowing developers to leverage Swift's Unicode.Scalar.Properties both reduces the amount of duplicate code developers need to write and possibility for errors, which might be quite common when trying to directly reference the code points listed in the GraphemeBreakProperty spec.
Proposed solution
I propose adding a new property to the Unicode.Scalar.Properties struct.
Detailed design
extension Unicode.Scalar.Properties {
public var isPrependConcatenationMark: Bool { get } // Prepend_Concatenation_Mark
}
Source compatibility
This change is strictly additive. This proposal does not affect source compatibility.
Effect on ABI stability
This change is strictly additive. This proposal does not affect the ABI of existing language features.
Effect on API resilience
This change is very minor and nearly identical to many of the getters in Unicode.Scalar.Properties. This change will not affect API resilience.
Alternatives considered
Simply not adding the property to the standard library was considered but would likely lead to more code errors due to mistakes implementing the specific code points. It also would be less convenient to developers as it is very difficult to extend Unicode.Scalar.Properties to include a getter that could be provided in a SwiftPM package or copied into existing code. This is because there is no easy way to access the internal var _value in the Unicode.Scalar.Properties struct.
As the original proposer/implementor of Unicode.Scalar.Properties, I would kind of hope that keeping it up-to-date with new ICU additions (at least for things as straightforward as new Boolean properties) would be as simple as a PR vs. the pitch/proposal cycle, but I'm not a policy maker
The only reason Prepended_Concatenation_Mark isn't already there is because it was added to the standard after the initial implementation in Swift, so I think adding it makes total sense.
The only thing that makes this slightly tricky is that the property will only be available on versions of Apple's OSes where libicucore.dylib is built from a recent enough version of ICU. Some of the emoji properties today have similar constraints. It looks like the following Boolean properties have been added to Unicode/ICU since Unicode.Scalar.Properties was implemented:
I don't happen to know off the top of my head what minimum versions of each OS correspond to ICU 60 and 62, though.
@Michael_Ilseman has discussed in the past the idea of embedding subsets of the Unicode data tables into the standard library to remove the odd coupling between these APIs and specific OS versions, and it would make these kinds of updates much easier, but I don't know what the status of that effort is.
With @Alejandro putting data into the Swift runtime/stdlib, that is removing any reliance on ICU and instead grabbing data directly from Unicode, I think we can support these a lot easier.
Simultaneously, we'll want to be a superset of scalar properties listed in UTS#18 (@nnnnnnnn), so I think this could be a good opportunity to add more functionality and clarify how these get updated.
@Sammcb sorry for the great delay (I don't know why I missed the notification), would you and/or @allevato be interested in pitching an update?
Could Unicode.Scalar conform to Strideable, instead of adding an AllScalars type?
(Range<Unicode.Scalar> and ClosedRange<Unicode.Scalar> would then have conditional RandomAccessCollection conformances.)
// in `stdlib/public/core/UnicodeScalar.swift`
extension Unicode.Scalar: Strideable {
public typealias Stride = Int
public func advanced(by distance: Stride) -> Unicode.Scalar
public func distance(to other: Unicode.Scalar) -> Stride
}
// in `stdlib/public/core/Stride.swift`
extension Strideable where Self == Unicode.Scalar {
public static func _step(
after current: (index: Int?, value: Self),
from start: Self, by distance: Self.Stride
) -> (index: Int?, value: Self)
}
for u: Unicode.Scalar in "\0"..."\u{10FFFF}" where u.properties.isMath {}
However, a Unicode.Set: SetAlgebra where Element == String type might be more useful: