Add isPrependConcatenationMark to Unicode.Scalar.Properties

Add isPrependConcatenationMark to Unicode.Scalar.Properties

Hi everyone!

Quick note: this is my first post in the Pitches category so please let me know if there's anything I should change or do differently. Thanks!

Introduction

Swift's Unicode.Scalar.Properties struct includes many useful property checks for scalars, such as isAlphabetic, isHexDigit, etc. I think including a new isPrependConcatenationMark property check would also be useful, corresponding to the Prepend_Concatenation_Mark property listed in the Unicode Standard.

Motivation

When handling grapheme cluster break detection, the grapheme break property table lists the grapheme break properties and their corresponding code points. Some of these values are defined in terms of individual code points, like CR, while others are defined in terms of Unicode properties, like Prepend.

Allowing developers to leverage Swift's Unicode.Scalar.Properties both reduces the amount of duplicate code developers need to write and possibility for errors, which might be quite common when trying to directly reference the code points listed in the GraphemeBreakProperty spec.

Proposed solution

I propose adding a new property to the Unicode.Scalar.Properties struct.

Detailed design

extension Unicode.Scalar.Properties {
	public var isPrependConcatenationMark: Bool { get } // Prepend_Concatenation_Mark
}

Source compatibility

This change is strictly additive. This proposal does not affect source compatibility.

Effect on ABI stability

This change is strictly additive. This proposal does not affect the ABI of existing language features.

Effect on API resilience

This change is very minor and nearly identical to many of the getters in Unicode.Scalar.Properties. This change will not affect API resilience.

Alternatives considered

Simply not adding the property to the standard library was considered but would likely lead to more code errors due to mistakes implementing the specific code points. It also would be less convenient to developers as it is very difficult to extend Unicode.Scalar.Properties to include a getter that could be provided in a SwiftPM package or copied into existing code. This is because there is no easy way to access the internal var _value in the Unicode.Scalar.Properties struct.

5 Likes

+1 for this.

Swift is one of a small list of languages with great Unicode support, this is a small step further. As it is additive change, lets just do it.

As the original proposer/implementor of Unicode.Scalar.Properties, I would kind of hope that keeping it up-to-date with new ICU additions (at least for things as straightforward as new Boolean properties) would be as simple as a PR vs. the pitch/proposal cycle, but I'm not a policy maker :slight_smile:

The only reason Prepended_Concatenation_Mark isn't already there is because it was added to the standard after the initial implementation in Swift, so I think adding it makes total sense.

The only thing that makes this slightly tricky is that the property will only be available on versions of Apple's OSes where libicucore.dylib is built from a recent enough version of ICU. Some of the emoji properties today have similar constraints. It looks like the following Boolean properties have been added to Unicode/ICU since Unicode.Scalar.Properties was implemented:

I don't happen to know off the top of my head what minimum versions of each OS correspond to ICU 60 and 62, though.

@Michael_Ilseman has discussed in the past the idea of embedding subsets of the Unicode data tables into the standard library to remove the odd coupling between these APIs and specific OS versions, and it would make these kinds of updates much easier, but I don't know what the status of that effort is.

6 Likes
Terms of Service

Privacy Policy

Cookie Policy