[FOU] Locale Components, Language, and Language Components

Apple’s Foundation team is working on improving Locale to promote clear semantics and provide straightforward ways to modify and inspect locale information. We’d like to get your feedback on if you think this will improve your experience with Locale. We would also like to learn more about your Locale use cases. Thanks!


Locale components, language, language components

Introduction

Locale encapsulates rich information about localization conventions, and is crucial in application development. We'd like to improve the current design to make it easier for you to utilize this type to its potential.

Most system APIs, including those of Foundation, use Locale via its string identifiers. To create a locale with the same root language as the current locale but with a different region, you have to manipulate the identifier, like this:

var components = Locale.components(fromIdentifier: Locale.current.identifier)
components[NSLocale.Key.countryCode.rawValue] = "GB"
let currentLocaleIdentifierForGB = Locale.identifier(fromComponents: components)
let localeGB = Locale(identifier: currentLocaleIdentifierForGB)

Having to modify a locale like this requires knowledge of what a locale contains, and is error prone. The code above also demonstrates a big constraint of the current API -- it does not separate the "region of the language" from "region of the locale". It creates the "en-GB" locale, where GB describes both the language variety and the region for regional preferences such as measurement system and calendar.

We'd like to improve these issues while promoting clear semantics and type safety with new types. We'd also like to take this opportunity to add APIs to answer questions such as if two Locales use the same language and offer ways to retrieve locale identifier conforming to various standards.

Proposed solution and example

Construct and modify Locale

We introduce Locale.Components to store attributes of a locale identifier. You create or modify a Locale via this type. For example, the below creates the locale of a user who uses British English while preferring the US format conventions:

var components = Locale.Components(languageCode: "en", languageRegion: "GB")
components.region = Locale.Region("US")

let en_GB_US = Locale(components: components) // Locale is "en_GB@rg=USzzzz"

You can also modify a locale. Below creates a locale with the same language as the current one but with a different calendar:

var currentComponents = Locale.Components(locale: .current)
currentComponents.calendar = .buddhist

let locale = Locale(components: currentComponents)

Get information about languages

We add the following types:

  • Locale.Language: Represents the language of a locale. Use this to get information about a language, just as what you do with Locale. Current APIs that handle languages via string identifiers will switch to use this type.

  • Locale.Language.Components: Similar to Locale.Components, it stores attributes concerning a language identifier. You can create a Language with Language.Components.

Here's an example for what Language offers:

let lang = Locale.Language(languageCode: "zh", region: "CN")
let script = lang.script // "Hans"
let longIdentifier = lang.maximalBCP47Identifier // "zh-Hans-CN"
let shortIdentifier = lang.minimalBCP47Identifier // "zh"

You can compare and test if two languages are equal in practical sense:

let currentLang = Locale.current.language
let enUS = Locale.Language(languageCode: "en", region: "US")

let is_en_US = currentLang.isEquivalent(to: enUS) // `en`, `en-Latn`, `en-US`, and `en-Latn-US` are equivalent
let is_en = currentLang.hasCommonParent(with: enUS)

You can test if a language code are valid ISO codes:

let _ = Locale.LanguageCode("es").isISOLanguage // true
let _ = Locale.LanguageCode("vulcan").isISOLanguage // false

Detailed design

Locale.Components

It is a struct to store attributes of a Unicode locale identifier. The property that will be used the most is probably languageComponents, which corresponds to a language identifier. Everything else is to extend the locale to support various applications. All the attributes are strongly typed. You'll find the definition in the section "Strong types used in Locale.Components and Language".

While Locale and Locale.Components have similar properties, they work differently:

  • Locale lets you inspect the default property values of a locale.

  • Locale.Components allows you to create a locale without having to use a string identifier, such as "en_GB@rg=USzzzz;calendar=buddhist". It is a simple struct that acts like a dictionary, but the keys and the values are strongly typed. All the properties are set/read as-is.

extension Locale {

    /// Use `Locale.Components` to create a `Locale` with specific overrides. 
    @available(macOS 9999, macCatalyst 9999, iOS 9999, tvOS 9999, watchOS 9999, *)
    public struct Components : Hashable, Codable, Sendable {

        /// Represents the Unicode language identifier part of a locale
        public var languageComponents: Language.Components

        /// Set this to override the default calendar of the locale. To request the default calendar used by the locale, use `Locale.calendar`
        ///
        /// Corresponds to the "ca" key of the Unicode BCP 47 extension
        public var calendar: Calendar.Identifier?

        /// Set this to override the string sort order of the locale. To request the default calendar used by the locale, use `Locale.calendar`
        ///
        /// Corresponds to the "co" key of the Unicode BCP 47 extension
        public var collation: Locale.Collation?

        /// Set this to override the currency of the locale. To request the default currency used by the locale, use `Locale.currency`
        ///
        /// Corresponds to the "cu" key of the Unicode BCP 47 extension
        public var currency: Locale.Currency?

        /// Set this to override the numbering system of the locale. To request the default numbering system used by the locale, use `Locale.numberingSystem`
        ///
        /// Corresponds to the "nu" key of the Unicode BCP 47 extension
        public var numberingSystem: Locale.NumberingSystem?

        /// Set this to override the first day of the week. To request the default first day of the week preferred by the locale, use `Locale.firstDayOfWeek`
        /// 
        /// Corresponds to the "fw" key of the Unicode BCP 47 extension
        /// The preferred first day of the week that should be shown in a calendar view. Not necessarily the same as the first day after the weekend, and should not be determined from the weekend information
        public var firstDayOfWeek: Locale.Weekday?

        /// Set this to override the hour cycle used by the locale. To request the default hour cycle, use `Locale.hourCycle`
        ///
        /// Corresponds to the "hc" key
        public var hourCycle: Locale.HourCycle?

        /// Set this to override the measurement system used by the region. To request the default measurement system, use `Locale.measurementSystem`
        ///
        /// Corresponds to the "ms" key of the Unicode BCP 47 extension
        public var measurementSystem: Locale.MeasurementSystem?

        /// Set this to override the region for region-related preferences, such as measuring system, calendar, and first day of the week. If unset, the region of the language component is used
        ///
        /// Corresponds to the "rg" key of the Unicode BCP 47 extension
        public var region: Locale.Region?

        /// Set this to override the regional subdivision of `region`
        ///
        /// Corresponds to the "sd" key of the Unicode BCP 47 extension
        public var subdivision: Locale.Subdivision?

        /// Set this to specify a time zone to associate with this locale
        ///
        /// Corresponds to the "tz" key of the Unicode BCP 47 extension
        public var timeZone: TimeZone?

        /// Set this to specify a variant used for the locale
        ///
        /// Corresponds to the "va" key of the Unicode BCP 47 extension
        public var variant: Variant?

        /// - Parameter identifier: Unicode language identifier such as "en-u-nu-thai-ca-buddhist-kk-true"
        public init(identifier: String)

        /// Creates a `Locale.Components` with the specified `Locale`
        public init(locale: Locale)

        /// Creates a `Locale.Components` with the specified language code, script and region for the language
        public init(languageCode: Locale.LanguageCode? = nil, script: Locale.Script? = nil, languageRegion: Locale.Region? = nil)
    }
}

Language and Language.Components

A language is composed of a language code, region code, and script code. Just as Locale can be created with Locale.Components, Language can be created with Language.Components:

  • Language provides information of a language. All the properties are readonly, and will be filled in with the value of the specified language. You use it to inspect a language.

  • Language.Components's properties are settable; you use it to construct or modify the pieces of the language.

Locale vs Language and Locale.region vs Language.region

A locale and language differs in that a locale contains a language and several individual settings. We separate Language from Locale to clarify the use of those types in system frameworks. For example, the "localization" in the current Bundle.preferredLocalizations in fact means languages despite its name.

Both the language and the locale contain a region, but the meaning can be different:

  • Language.region represents the regional variety of the language, such as "British English" or "Canadian English".

  • Locale.region controls the region-specific default values, such as measuring system and first day of the week.

Locale.region inherits from Language.region if it's left unset, so it's sufficient to only specify Language.region when the regions are the same.

extension Locale {

    @available(macOS 9999, macCatalyst 9999, iOS 9999, tvOS 9999, watchOS 9999, *)
    public struct Language : Hashable, Codable, Sendable {
        
        @available(macOS 9999, macCatalyst 9999, iOS 9999, tvOS 9999, watchOS 9999, *)
        /// Represents a language identifier
        public struct Components : Hashable, Codable, Sendable {
    
            /// The language code of the language
            public var languageCode: Locale.LanguageCode?   
            
            /// The script of the language
            public var script: Locale.Script?
            
            /// The region of the language
            public var region: Locale.Region?
    
            /// Creates a `Components` with the specified language identifier, such as "en-US", "es-419", "zh-Hant-TW"
            public init(identifier: String)
    
            /// Creates a `Components` with the specified language code, script, and region
            public init(languageCode: Locale.LanguageCode? = nil, script: Locale.Script? = nil, region: Locale.Region? = nil)
            
            /// Creates a `Components` with the specified language
            public init(language: Locale.Language)
        }
        
        /// Creates a `Language` with specified identifier, such as "en-US", "es-419", "zh-Hant-TW"
        public init(identifier: String)
        
        /// Creates a `Language` with the specified components
        public init(components: Language.Components)
        
        /// Creates a `Language` with the specified language code, script, and region
        public init(languageCode: Locale.LanguageCode? = nil, script: Locale.Script? = nil, region: Locale.Region? = nil)
        
        /// The language code of the language. Returns nil if it cannot be determined
        public var languageCode: Locale.LanguageCode? { get }
        
        /// The script of the language. Returns nil if it cannot be determined
        public var script: Locale.Script? { get }
        
        /// The region of the language. Returns nil if it cannot be determined
        public var region: Locale.Region? { get }
        
        /// Ordering of lines within a page, e.g. top-to-bottom for English; right-to-left for Mongolian in the Mongolian script
        public var lineDirection: Locale.LanguageDirection { get }
        
        /// Ordering of characters within a line, e.g. left-to-right for English; top-to-bottom for Mongolian in the Mongolian script
        public var characterDirection: Locale.LanguageDirection { get }

        /// For example, the macroLanguage for "cmn" (Mandarin Chinese) is "zh" (Chinese)
        /// Returns nil if there isn't a known macrolanguage
        public var macroLanguage: Language? { get }

        /// Returns the parent language of a language, e.g. `en_US"` for `"en-US-POSIX"`
        /// Returns nil if the parent language cannot be determined
        public var parent: Language? { get }
        
        /// Returns if `self` shares the parent as the specified language
        public func hasCommonParent(with language: Language) -> Bool

        /// Returns if `self` and the specified `language` are equal after expanding missing components
        /// For example, `en`, `en-Latn`, `en-US`, and `en-Latn-US` are equivalent
        /// Different from `==`, which tests object equality 
        public func isEquivalent(to language: Language) -> Bool
        
        /// Returns the identifier in a minimal form. Script and region may be omitted, e.g. "zh-TW", "en"
        public var minimalBCP47Identifier: String { get }

        /// Returns the identifier that always includes the script: "zh-Hant-TW", "en-Latn-US"
        public var maximalBCP47Identifier: String { get }
        
        /// Returns a list of system languages, includes the languages of all product localization for the current platform
        public static var systemLanguages: [Language] { get }
    }
} 

Strong types used in Locale.Components and Language

The types for the properties of Locale.Components, Language and Language.Components are defined as follows:

@available(macOS 9999, macCatalyst 9999, iOS 9999, tvOS 9999, watchOS 9999, *)
extension Locale {

    public struct LanguageCode : Hashable, Codable, Sendable, ExpressibleByStringLiteral {
        public var identifier: String 
        
        /// Creates a `LanguageCode` type
        /// - Parameter identifier: A two-letter ISO 639-1 or three-letter ISO 639-2 code, or a language code of your choice if using a custom language, such as "en" for English. Case-insensitive.
        public init(_ identifier: String)
        
        /// `ExpressibleByStringLiteral` conformance
        public init(stringLiteral value: String)  
        
        /// Returns if the language is a known ISO language
        public var isISOLanguage: Bool { get }

        /// Represents an unknown language
        public static let unknown: Locale.LanguageCode
        
        /// Returns a list of `LanguageCodes` that are two-letter language codes defined in ISO 639 and two-letter codes without a two-letter equivalent
        public static var isoLanguageCodes: [LanguageCode] { get }
    }
    
    public struct Script : Hashable, Codable, Sendable, ExpressibleByStringLiteral {
        public var identifier: String 
        
        /// Creates a `Script` with specified identifier
        /// - Parameter identifier: A BCP 47 script subtag such as "Arab", "Cyrl" or "Latn". Case-insensitive.
        public init(_ identifier: String)
        
        /// `ExpressibleByStringLiteral` conformance
        public init(stringLiteral value: String) 
        
        /// Returns a list of script codes defined by ISO 15924
        public static var isoScripts: [Script] { get }
         
        /// Represents an uncoded script
        public static let unknown: Locale.Script
    }
      
    public struct Region : Hashable, Codable, Sendable, ExpressibleByStringLiteral {
        public var identifier: String
        
        /// Creates a `Region` with the specified region code, a two-letter BCP 47 region subtag such as "US" for the United States. Case-insensitive.
        public init(_ identifier: String)

        /// `ExpressibleByStringLiteral` conformance
        public init(stringLiteral value: String) 
        
        /// Returns if the region is a known ISO region
        public var isISORegion: Bool { get }
        
        /// Represents an unknown or invalid region
        public static let unknown: Locale.Region
           
        /// Returns all the sub-regions of the region
        public var subRegions : [Region] { get }
        
         /// Returns the region within which the region is contained, e.g. for `US`, returns `Northern America`
        public var containingRegion: Region? { get }

        /// Returns the continent of the region. Returns `nil` if the continent cannot be determined, such as when the region isn't an ISO region
        public var continent: Region? { get }
        
        /// Returns a list of regions of a specified type defined by ISO 
        public static var isoRegions: [Region] { get }
    }
    
    /// A subdivision of a country or region, such as a state in the United States, or a province in Canada.
    public struct Subdivision : Hashable, Codable, Sendable, ExpressibleByStringLiteral {
        public var identifier: String 
        
        /// Creates a subdivision with the given identifier
        /// - Parameter identifier: A unicode subdivision identifier, such as "usca" for California, US. Case-insensitive. The complete list of subdivision identifier can be found [here](https://github.com/unicode-org/cldr/blob/maint/maint-40/common/validity/subdivision.xml), under the "subdivision" type
        public init(_ identifier: String)
        
        /// `ExpressibleByStringLiteral` conformance
        public init(stringLiteral value: String) 
        
        /// Returns the subdivision representing the given region as a whole, e.g. "uszzzz" for the unspecified subdivision of the US region
        public static func subdivision(for region: Region) -> Subdivision
    }

    public struct Collation : Hashable, Codable, Sendable, ExpressibleByStringLiteral {
        public var identifier: String
        
        /// The complete list of valid collation identifiers can be found [here](https://github.com/unicode-org/cldr/blob/latest/common/bcp47/collation.xml), under the key named "co"
        public init(_ identifier: String)
        
        /// `ExpressibleByStringLiteral` conformance
        public init(stringLiteral value: String) 

        /// Dedicated for string search. This is only appropriate for determining whether two strings should be considered equivalent. 
        /// Using this may ignore or modify the string for searching purpose. It should not be used to determine the relative order of the two strings.
        public static let searchRules: Locale.Collation

        /// The default ordering for each language
        public static let standard: Locale.Collation
    }
    
    public struct Currency : Hashable, Codable, Sendable, ExpressibleByStringLiteral {
        public var identifier: String
        
        /// The complete list of valid currency codes can be found [here](https://github.com/unicode-org/cldr/blob/latest/common/bcp47/currency.xml), under the key with the name "cu"
        public init(_ identifier: String)
        
        /// `ExpressibleByStringLiteral` conformance
        public init(stringLiteral value: String) 
        
        /// Returns if the currency is a known ISO currency
        public var isISOCurrency: Bool { get } 
        
        /// Represents an unknown currency, used when no currency is involved in a transaction
        public static let unknown: Locale.Currency
        
        /// Returns a list of `Locale` currency codes defined in ISO-4217
        public static var isoCurrencies: [Currency] { get }
    }
    
    public struct NumberingSystem : Hashable, Codable, Sendable, ExpressibleByStringLiteral {
        public var identifier: String
        
        /// The complete list of valid numbering systems can be found [here](https://github.com/unicode-org/cldr/blob/latest/common/bcp47/number.xml), under the key with the name "nu"
        public init(_ identifier: String)
        
        /// `ExpressibleByStringLiteral` conformance
        public init(stringLiteral value: String) 
    }
    
    public enum Weekday : Codable, Hashable, Sendable {
         case sunday
         case monday
         case tuesday
         case wednesday
         case thursday
         case friday
         case saturday
    }
    
    /// Hour cycle
    public enum HourCycle : Int, Codable, Hashable, Sendable {

        /// 12-hour clock. Hour ranges from 0 to 11
        case zeroToEleven

        /// 12-hour clock. Hour ranges from 1 to 12
        case oneToTwelve

        /// 24-hour clock. Hour ranges from 0 to 23
        case zeroToTwentyThree

        /// 24-hour clock. Hour ranges from 1 to 24
        case oneToTwentyFour
    }
    
    public struct MeasurementSystem : Codable, Hashable, Sendable, ExpressibleByStringLiteral {
        public var identifier: String
        
        /// The complete list of valid measurement systems can be found [here](https://github.com/unicode-org/cldr/blob/latest/common/bcp47/measure.xml), under the key with the name "ms"
        public init(_ identifier: String)
        
        /// `ExpressibleByStringLiteral` conformance
        public init(stringLiteral value: String) 

        /// Metric system
        public static let metric: Locale.MeasurementSystem
        
        /// US customary system of measurement: feet, pints, etc.; pints are 16oz
        public static let usSystem: Locale.MeasurementSystem
        
        /// UK system of measurement (British Imperial): feet, pints, etc.; pints are 20oz
        public static let ukSystem: Locale.MeasurementSystem
        
        /// Returns a list of measurement systems
        public static var measurementSystems: [Locale.MeasurementSystem] { get }
    }

    public struct Variant : Codable, Hashable, Sendable, ExpressibleByStringLiteral {
        public var identifier: String
        
        /// The complete list of valid variants can be found [here](https://github.com/unicode-org/cldr/blob/latest/common/bcp47/variant.xml), under the key named "va"
        public init(_ identifier: String)
        
        /// `ExpressibleByStringLiteral` conformance
        public init(stringLiteral value: String) 
        
        /// Represents the POSIX variant
        public static let posix: Variant
    }
}

Locale improvement

We extend Locale to work with Locale.Components and Locale.Language. We also add functions to retrieve identifiers following different standards.

@available(macOS 9999, macCatalyst 9999, iOS 9999, tvOS 9999, watchOS 9999, *)
extension Locale {
    
    /// Creates a `Locale` with the specified components 
    public init(components: Locale.Components)
    
    /// Identifier following the ISO/IEC 15897 standard. Similar to the BCP 47 language tag, but the character set is included, e.g. "en_AU.UTF-8"
    public var posixLocaleIdentifier: String { get }

    /// Identifier following ICU (International Components for Unicode) convention. Does not include character set, e.g. "th_TH@calendar=gregorian;numbers=thai"
    public var icuLocaleIdentifier: String { get }

    /// Similar to `icuLocaleIdentifier`, but without key-value type keyword list, e.g. "th_TH_u_ca_gregory_nu_thai"
    public var unicodeCLDRLocaleIdentifier: String { get }

    /// A valid BCP47 language tag, e.g. "th-TH-u-ca-gregory-nu-thai"
    public var unicodeBCP47LocaleIdentifier: String { get }
}

Impact on existing code

For those API that refers to locale properties via String, we introduce counterparts to use the specific type. You are encouraged to switch to use those that take and return strong types.

@available(macOS 9999, macCatalyst 9999, iOS 9999, tvOS 9999, watchOS 9999, *)
extension Locale {
    /// Returns the measurement system of the locale
    public var measurementSystem: MeasurementSystem { get }

    /// Returns the currency of the locale
    public var currency: Currency? { get }

    /// Returns the numbering system. If the locale has an explicitly specified numbering system in the identifier (e.g. `bn_BD@numbers=latn`) or in the associated `Locale.Components`, that numbering system is returned. Otherwise, returns the default numbering system.
    public var numberingSystem: NumberingSystem? { get }

    /// Returns all the valid numbering systems. For example, `"ar-AE (Arabic (United Arab Emirates)"` has both `"latn" (Latin digits)` and `"arab" (Arabic-Indic digits)` numbering system.
    public var availableNumberingSystems: [NumberingSystem] { get }
    
    /// Returns the first day of the week
    public var firstDayOfWeek: Weekday? { get }

    /// Returns the hour cycle such as whether it uses 12-hour clock or 24-hour clock
    public var hourCycle: HourCycle? { get }

    /// Returns the default collation 
    public var collation: Collation { get }
    
    /// Returns a localized string for a specified language
    public func localizedString(forLanguageCode languageCode: LanguageCode) -> String?

    /// Returns a localized string for a specified region
    public func localizedString(forRegion region: Region) -> String?

    /// Returns a localized string for a specified script
    public func localizedString(forScript script: Script) -> String?

    /// Returns a localized string for a specified variant
    public func localizedString(forVariant variant: Variant) -> String?

    /// For example, in the "en" locale, the result for `"USD"` is `"US Dollar"`
    public func localizedString(forCurrency currency: Currency) -> String?

    /// Returns a localized string for a specified collation
    public func localizedString(forCollation collation: Collation) -> String?    
    
    /// Returns a posix identifier from the given string, e.g. "en_AU.UTF-8"
    public static func posixLocaleIdentifier(from string: String) -> String 
    
    /// Returns a ICU identifier from the given string, e.g. "th_TH@calendar=gregorian;numbers=thai"
    public static func icuLocaleIdentifier(from string: String) -> String 

    /// Returns a unicode locale identifier from the given string, e.g. "th_TH_u_ca_gregory_nu_thai"
    public static func unicodeCLDRLocaleIdentifier(from string: String) -> String 
    
    /// Returns a BCP 47 locale identifier from the given string, e.g. "th-TH-u-ca-gregory-nu-thai"
    public static func unicodeBCP47LocaleIdentifier(from string: String) -> String 

    @available(*, deprecated, message: "Use measurementSystem instead")
    public var usesMetricSystem: Bool { get }
    
    @available(*, deprecated, message: "Use `collation` instead")
    public var collationIdentifier: String? { get }
    
    @available(*, deprecated, message: "Use `LanguageCode.isoLanguageCodes` instead")
    public static var isoLanguageCodes: [String] { get }
    
    @available(*, deprecated, message: "Use `Region.isoRegions` instead")
    public static var isoRegionCodes: [String] { get }
    
    @available(*, deprecated, message: "Use `Currency.isoCurrencies` instead")
    public static var isoCurrencyCodes: [String] { get }
     
    @available(*, deprecated, message: "Use `Locale.Components` to access components")
    public static func components(fromIdentifier string: String) -> [String : String]
    
    @available(*, deprecated, message: "Use `posixLocaleIdentifier`, `icuLocaleIdentifier`, `unicodeCLDRLocaleIdentifier`, or `unicodeBCP47LocaleIdentifier` instead")
    public static func canonicalIdentifier(from string: String) -> String
    
    @available(*, deprecated, message: "Use `public func localizedString(forLanguageCode languageCode: LanguageCode) -> String?`")
    public func localizedString(forLanguageCode languageCode: String) -> String?

    @available(*, deprecated, message: "Use `public func localizedString(forRegion region: Region) -> String?`")
    public func localizedString(forRegionCode regionCode: String) -> String?

    @available(*, deprecated, message: "Use `public func localizedString(forScript script: Script) -> String?`")
    public func localizedString(forScriptCode scriptCode: String) -> String?

    @available(*, deprecated, message: "Use `public func localizedString(forVariant variant: Variant) -> String?`")
    public func localizedString(forVariantCode variantCode: String) -> String?

    @available(*, deprecated, message: "Use `public func localizedString(forCurrency currency: Currency) -> String?`")
    public func localizedString(forCurrencyCode currencyCode: String) -> String?
    
    @available(*, deprecated, message: "Use `public func localizedString(forCollation collation: Collation) -> String?`")
    public func localizedString(forCollationIdentifier collationIdentifier: String) -> String? {
}

@available(macOS 13, iOS 16, tvOS 16, watchOS 9, *)
extension Bundle {
    /// Similar to `Bundle.preferredLocalizations`, but returns a list of `Language` 
    public var preferredLanguages: [Locale.Language] { get }
    
    /// Similar to `Bundle.localizations`, but returns a list of `Language` 
    public var bundleLanguages: [Locale.Language] { get } 
    
    /// Similar to `Bundle.developmentLocalization`, but returns a `Language` 
    public var developmentLanguage: Locale.Language? { get }
    
    /// Similar to `preferredLocalizations(from localizationsArray: [String]) → [String]` 
    public class func preferredLanguages(from languages: [Locale.Language]) → [Locale.Language]
    
    /// Similar to `class func preferredLocalizations(from localizationsArray: [String], forPreferences preferencesArray: [String]?) → [String]` 
    public class func preferredLanguages(from languages: [Locale.Language], forPreferences preferences: [Locale.Language]?) → [Locale.Language]
}

Alternatives considered and future direction

Define known language code, region code, etc as extensible enum cases

We considered listing all known language codes, region codes and script codes as extensible enum cases. Given there are hundreds of known identifiers for all these types, we don't find the benefit substantial enough to justify the technical overhead. It is almost impossible to keep them always up-to-date while maintaining source compatibility. We also find it very difficult to come up with names that are more precise than the identifier. Giving them a type of their own is sufficient to provide type safety. We believe that is sufficient to improve Locale as the first step. We can consider exposing common used codes as static variables of the corresponding types in the future.

Region enhancements

We can expand Locale.Region in the future to provide more answers about continents and territory. These are already supported by ICU via uregion API, but it's prudent to gather more feedback on what are really needed for now.

Localized names for locale component properties

Currently we have these functions to return a localized name for a given Currency and Script, etc. In the future we can consider also introducing dedicated Locale.Currency.FormatStyle and Locale.Script.FormatStyle to return the localized names of these types in different styles, including capitalization contexts and widths.

26 Likes

Thanks for sharing this with the community ^^

The part I'm more excited about is to be able to compare languages.

I'm not sure if this is exactly what I need but it seems to get close.

For context, I often want to have access to whatever logic Foundation uses to decide which language files in the bundle to use based on the user's locale. Foundation is very smart about it, handling edge cases where the resulting user locale doesn't match any of the apps supported ones. But all of that is not accessible if you don't have the files in the bundle (AFAIK, at least when I wrote this). I would love an api to which I can provide an array of languages and Foundation picks the most appropriate. But maybe that can be built on top of these new functions ^^

+1 for me.

5 Likes

These look like solid improvements to using Locale and friends in Swift.

One point of clarification - will these improvements be just extensions on the existing types or are you taking the opportunity to introduce a Swift native implementation of Locale? The latter would begin to reduce Foundation's reliance on ICU and build upon the recent work in the language and would be fantastic from a server point of view

7 Likes

This looks really nice!

Locale support is an area where I think Foundation has made the world a better place (really!). It's something that a lot of other systems neglected in the past, but Foundation's support has been excellent for a really long time. It made computing more accessible for a lot of people, and really raised the bar for how well computers should support non-English languages and customs. It's definitely an asset for the Swift ecosystem.

I like the way this replaces string-typing with specific structures. It looks great for advanced locale work (which is, admittedly, not something I know much about). But even for my limited use, there are 2 things that I remember being a bit awkward:

  1. Simulating different locales

    This can be really useful for testing. Xcode has region settings in its scheme editor, but AFAIK there's no way for Swift Packages to override the locale for a specific test (or portion of a test).

    For example, maybe I'd like to test that computing a particular value does not depend on the user's locale - so I'd want to run that test in the context of a Chinese locale, or Russian locale (or maybe even just a random non-English locale as opposed to anything specific).

  2. Locale changes

    Currently, change events go through NotificationCenter. I think Combine makes it a bit easier to work with that, but it's limited to Apple systems. I wonder if this is an appropriate place to introduce an AsyncSequence of locale-change events, or would it be out of scope?

8 Likes

This looks nice! Hope we keep getting more of these Foundation proposals and It would be more easy to notice them in the forum with a [FOU] or something like that in the post title.

3 Likes

There is actually already Bundle API to do exactly this:

let b = Bundle(identifier: "com.apple.Foundation")!
let locs = b.localizations
let r1 = Bundle.preferredLocalizations(from: locs) // using my preferences, English
print(r1) // ["en"]

let r2 = Bundle.preferredLocalizations(from: locs, forPreferences: ["es_MX"]) // preferring es_MX, which we do not have a specific localization for
print(r2) // ["es_419", "es"]

I don't think it should be necessarily specified as "simulating". There are legit use cases where an app prefers a certain locale other than system locale, for example language learning apps that provide multiple courses. Switching between courses may switch the UI language within such app separately from the system locale and language settings. It would be quite beneficial if Foundation supported this in a nicer way.

5 Likes

Thanks! I took your suggestion

4 Likes

They're extensions on the existing type. While having Swift native implementation of Locale is definitely ideal, I'm not sure how it will reduce Foundation's reliance on ICU. Most of the localization support depends on ICU, including all these newly proposed ones here. I'd prefer delegating these functionalities to ICU (the experts). Do you mind clarifying?

1 Like

Ah ok that clears up my query.

My question was whether this would be a ground up rewrite. Removing Foundation's reliance on ICU is a massive undertaking and would likely need to be done in chunks rather than in one go. A ground up rewrite in Swift for the parts that rely on ICU would dramatically reduce things like binary sizes and should be a long term goal for Swift. Understandable that it's not part of this pitch though!

2 Likes

Very nice to see more Foundation evolution here. A few quick notes:

I foresee some amount of confusion on this. I can understand the difference between a value used as a locale region versus a language region, but it eludes me why they are different types, but that the Locale initializer allows setting a languageRegion of type Locale.Region but which is a value that is possibly but not always distinct from the locale region, which cannot be set in the Locale initializer... is really hard to wrap one's head around.

Given the new facilities added to Swift that allow for more leading dot syntax use in, e.g., SwiftUI, I'd urge some reconsideration of this point as others have. SwiftPM might be a good model to look at where the most common values are static members (I don't know that they need always be in dedicated enums) and uncommon values can be constructed in the usual way. The ergonomic wins of being able to write .en_ to get autocomplete and to know that you've selected a valid locale when you intend to use a common one cannot be underestimated.

/// Identifier following the ISO/IEC 15897 standard. Similar to the BCP 47 language tag, but the character set is included, e.g. "en_AU.UTF-8"
public var posixLocaleIdentifier: String { get }

/// Identifier following ICU (International Components for Unicode) convention. Does not include character set, e.g. "th_TH@calendar=gregorian;numbers=thai"
public var icuLocaleIdentifier: String { get }

/// Similar to `icuLocaleIdentifier`, but without key-value type keyword list, e.g. "th_TH_u_ca_gregory_nu_thai"
public var unicodeCLDRLocaleIdentifier: String { get }

/// A valid BCP47 language tag, e.g. "th-TH-u-ca-gregory-nu-thai"
public var unicodeBCP47LocaleIdentifier: String { get }

Have you considered, along the same line of thinking, a spelling like identifier(.icu), identifier(.bcp47), etc.? Such a spelling would allow a user to see (via autocomplete, etc.) all the identifier formats supported. To avoid redundancy, the word locale can be omitted too, taking us from Locale.xxxLocaleIdentifier to Locale.localeIdentifier(.xxx) to Locale.identifier(.xxx). This would also be more easily extensible without the need to add new top-level APIs to Locale.

11 Likes

I didn't think most clients need to wrap their heads around this because I expect the locale region to be the same as language region in most cases. That's also the reason why it is not exposed in the initializer since I want to keep the most relevant bits in the initializer. Have you found that not to be the case? Or do you think a member wise initializer like such would make things clearer?

init(languageCode: Locale.LanguageCode? = nil, script: Locale.Script? = nil, languageRegion: Locale.Region? = nil, localeRegion: Locale.Region? = nil, calendar: Calendar.Identifier? = nil, timeZone: TimeZone? = nil, collation: Collation? = nil, firstDayOfWeek: Weekday? = nil, currency: Currency? = nil, measurementSystem: MeasurementSystem? = nil, hourCycle: HourCycle? = nil)

I do not disagree. I'm curious what the scenarios are when you need to spell out the locale identifier directly, e.g. Locale(identifier: "en_US"), instead of just Locale.current. Can you share your use cases?

Great suggestion. Thanks!

I appreciate everyone's valuable feedback. Feel free to keep replying here if anything comes up. For the time being I'm going to wrap it up here and move on internally. Thank you all!

1 Like

How to get the country of a region (example "CA" -> "GB")?

Can you please clarify that you meant? "CA" as in Canada?

I want to share a story that maybe will inspire for improving these APIs or writing better documentation for new comers like me.

My use case is to create a custom Locale using Locale.Components that will render numbers using a specific script. I will then use it in a date formatter.

It all started when I wanted to create a locale component instance with a specific numbering system, which led me to this page in Xcode's Developer Documentation window. This was when I was first introduced to BCP 47 (I didn't know what is it at this point).

After diving deeper, I found that I can use availableNumberingSystems (print its value somewhere), this showed me an array of strings. I had to spend a lot of time to understand where to look for to know which numbering system ID shows the digits I wanted. I followed a bunch of hyperlinks inside unicode.org. In order:

  1. Googling "BCP 47" led me to this Wikipedia article.
  2. The section "Extension U (Unicode Locale)" gave me a hint so I googled '"latn" numbering system unicode extension', this led me to this unicode org page.
  3. The last line of the Numbering Systems section led me to this page.
  4. This finally led me to what I was looking for here: supplemental/numberingSystems.xml.

Then I was finally able to see which numbering system identifier renders the digits I wanted.

I think many developers like me solving this problem for the first time will find it very confusing to start with the documentation. I imagine many will just give up on the documentation and start to experiment and apply trial and error until they reach what they are looking for.

Since the list of numbering system identifiers is predefined in the standard, I think it would be a major improvement if those numbering systems are defined as static instances like .arab and .latn. Locale.NumberingSystem can have a new initializer that accepts a value of those static instances instead of a String value (e.g. Locale.NumberingSystem(identifier: .arab). Also documentation for each of these static values can show what digits will be rendered. For example:

/// A numbering system that uses the digits: ٠١٢٣٤٥٦٧٨٩
public static var arab: Locale.NumberingSystem { get }

If this is not feasable, then I think the documentation for Locale.NumberingSystem can elaborate more on what BCP 47 is, and provide a table that shows how each value in availableNumberingSystems will render digits or link to the supplemental/numberingSystems.xml as the source of truth for finding this information.

4 Likes

Thanks for sharing your use case. I agree that defining some commonly used numbering system as static vars would be helpful.

I went ahead and created this issue: Consider exposing common `Locale.NumberingSystem` as static variables · Issue #1052 · swiftlang/swift-foundation · GitHub

1 Like