[Pitch] Add static properties for `Locale.NumberingSystem`

Hi everyone,

I have a simple pitch for adding a comprehensive set of static properties for Locale.NumberingSystem.


Feature name

Introduction

This proposal adds static properties to Locale.NumberingSystem for all standard numbering systems defined in Unicode CLDR, making it easier to work with different numbering systems in Swift.

Motivation

Currently, to use a specific numbering system, developers need to create instances using string identifiers:

let arabic = Locale.NumberingSystem("arab")

This approach has several drawbacks:

  • Lack of Discoverability: Developers may not be aware of all available numbering systems or their corresponding identifiers.
  • Error-Prone: Manually typing string identifiers increases the risk of typos and mistakes.
  • Reduced Readability: String literals provide less context compared to well-named constants.
  • Inconsistency: Other Locale components like Locale.LanguageCode, Locale.Region, and Locale.Script already provide static properties for common identifiers, but Locale.NumberingSystem does not.

By introducing predefined static properties for each numbering system, we can improve code safety, discoverability, readability, and maintain consistency across the Locale API.

Proposed solution

Extend Locale.NumberingSystem to include static properties for each numbering system defined in the Unicode CLDR.

Example usage:

let numberingSystem = Locale.NumberingSystem.arabic

This allows developers to:

  • Use autocomplete features to discover available numbering systems.
  • Reduce typos and mistakes by avoiding manually typed strings: let numberingSystem = Locale.NumberingSystem("arabic") // Incorrect identifier
  • Improve code clarity with descriptive property names. For example, Locale.NumberingSystem.simplifiedChinese instead of Locale.NumberingSystem("hans")

Detailed design

Add an extension to Locale.NumberingSystem containing static properties for each numbering system. The identifiers are sourced from the Unicode CLDR's numbering systems registry.

@available(FoundationPreview 6.2, *)
@available(macOS 13, iOS 16, tvOS 16, watchOS 9, *)
extension Locale.NumberingSystem {
    @_alwaysEmitIntoClient
    public static var adlam: Locale.NumberingSystem { Locale.NumberingSystem("adlm") }

    @_alwaysEmitIntoClient
    public static var ahom: Locale.NumberingSystem { Locale.NumberingSystem("ahom") }

    @_alwaysEmitIntoClient
    public static var arabic: Locale.NumberingSystem { Locale.NumberingSystem("arab") }

    @_alwaysEmitIntoClient
    public static var arabicExtended: Locale.NumberingSystem { Locale.NumberingSystem("arabext") }

    // ... all other numbering systems
}

The full list can be viewed in the implementation pull request. Variable names are assigned based on the descriptions provided in the Unicode CLDR.

Source compatibility

These changes are additive only and are not expected to have an impact on source compatibility.

Implications on adoption

This new API will have FoundationPreview 6.2 availability.

Acknowledgments

Thanks to @alobaili for highlighting this issue in their comment on the Swift forums, which inspired this proposal.

5 Likes

I believe that if you have the API marked as @_alwaysEmitIntoClient you can keep the initial availability since anyone with a new enough compiler will always emit the implementation code into their client module, so it can back-deploy as far back as the APIs it itself uses (Locale.NumberingSystem.init(_: String)) exists.

2 Likes

Thank you @glebfann for taking this initiative! I'm plus one for this proposal.

However, regarding Arabic and extended Arabic, it is better to use the formal name defined by the Unicode CLDR. Using arabic and arabicExtended for the properties' names will be confusing because Arabic can mean the numerals 0123456789 and ٠١٢٣٤٥٦٧٨٩ depending who you are talking to (learn more). I believe the standard addressed this very well by defining them as Arabic-Indic and Extended Arabic-Indic. Reflecting this in the new static properties' names arabicIndic and extendedArabicIndic (or arabicIndicExtended to share the arabicIndic prefix and ease discoverability) will match the standard well and guide developers better on which numbering system to use. I assume the same risk of confusion might be found in the rest of the numbering systems' names. It might be a good practice to use the description value in the XML to name the new properties.

On another note, I'm not sure if documentation should be addressed here, but it would be amazing to include small documentation for each property that will let the developer know what the numbers will look like so it can be immediately discovered in, for example, Xcode's Quick Help feature, or similar documentation UI in other code editors.

/// A numbering system that uses the numerals ٠١٢٣٤٥٦٧٨٩
public static var arabicIndic: Locale.NumberingSystem { Locale.NumberingSystem("arab") }

/// A numbering system that uses the numerals 0123456789
public static var latin: Locale.NumberingSystem { Locale.NumberingSystem("latn") }
5 Likes

Thanks for pointing this out! I'll remove the FoundationPreview 6.2 availability and keep the initial availability in both the proposal text and the implementation PR.

Agree with this feedback.

There are also a few other names that could use some refinement. For example:

  • Swift's string case APIs are named uppercase, etc.—so armenianUppercase (and so on) would better match both the actual standard's description as well as Swift terminology rather than armenianUpper.
  • The standard describes "fullwide" as "Full width digits"—so fullwidth or fullWidth might better align with that description.
  • The members corresponding to "hanidays" and "hanidec" could use words derived from the actual descriptions—maybe hanDayOfMonth and hanDecimal?
  • Gannen (元年) translates roughly to first-year or origin year; not sure if japaneseYear gets to the meaning better than japaneseGannen.
  • "N'Ko" using our capitalization convention would be, I think, nKo? This one is hard.
  • sinhalaLith would better align with the standard's description than sinhala.
  • Regarding "tamldec", described as "Modern Tamil decimal," I think it's more aptly named tamilDecimal rather than modernTamil—to my understanding, it is to "taml" as "hanidec" is to "han[s|t]"; or, to use an English example, "four two" versus "forty-two."
3 Likes

Thank you for your feedback!

Yes, I missed this nuance about Arabic-Indic naming. I tried to minimize naming for better readability, but completely agree with your point about potential confusion. I'll update the implementation to use arabicIndic and arabicIndicExtended to better align with the Unicode standard.

I aimed to assign variable names based on the descriptions provided in the Unicode CLDR, as mentioned in the proposal:

Variable names are assigned based on the descriptions provided in the Unicode CLDR

I also considered other resources to ensure consistency and clarity. However, I will revisit the names to ensure they align closely with the standard descriptions and minimize any ambiguity.

That's a good idea. Including brief documentation for each property to illustrate what the numerals look like would greatly enhance discoverability and usability for developers.
How about using it with a description or does it seem to be duplicate information? For example:

/// Armenian uppercase numbering system using digits Ա Բ Գ Դ Ե Զ Է Ը Թ
public static var armenianUppercase: Locale.NumberingSystem { Locale.NumberingSystem("armn") }

/// Balinese numbering system using digits ᭐ ᭑ ᭒ ᭓ ᭔ ᭕ ᭖ ᭗ ᭘ ᭙
public static var balinese: Locale.NumberingSystem { Locale.NumberingSystem("bali") }

/// Mathematical double-struck numbering system using digits 𝟘 𝟙 𝟚 𝟛 𝟜 𝟝 𝟞 𝟟 𝟠 𝟡
public static var mathDoubleStruck: Locale.NumberingSystem { Locale.NumberingSystem("mathdbl") }

1 Like

I think the description should include the corresponding CLDR name and description so that users can find that information without having to read the implementation.

A number of the number systems are either not positional, or are completely algorithmic, so simply listing digits wouldn't help with those or could be misleading. But with the right reference links (just like you've done with the proposal) I think developers can be trusted to find more info if they want it.

3 Likes

Thank you so much for this detailed feedback and for diving deep into the naming conventions. I agree with this suggestions.

  • The members corresponding to "hanidays" and "hanidec" could use words derived from the actual descriptions—maybe hanDayOfMonth and hanDecimal?

Those suggestions look good. I also considered variants like hanIdeographicDecimal and haniDecimal for "hanidec".

  • "N'Ko" using our capitalization convention would be, I think, nKo? This one is hard.

Seems appropriate, though it looks a bit odd.

For the numeric systems that are possible to document like this, that would be great.

I noticed this as well. I agree with adding reference links. I still think that, for the numeric type numbering systems, which have exactly ten digits, including how they are rendered in the documentation would be beneficial. For the algorithmic numbering systems, the link would suffice.

1 Like

I've fixed the issues you pointed out in the previous comments and added documentation with format:
First line: a description from Unicode CLDR.
Second line: the identifier.

I believe adding links to each property documentation might not be the best approach, as if the links change, the documentation would become outdated for many variables. Having both the description and identifier provides sufficient information for developers to find all necessary details about specific numbering system. This also applies to adding specific digits for numbering systems.

Actually CLDR reference link is present in the initializer documentation. Is that not enough?

1 Like