So, yes - legacy encodings are useful, even if we don't have any name parsing.
I find it difficult to parse the rest of your question. Anything that processes legacy text would, of course, benefit from support for legacy text encodings; but many of those don't need the web-compatible names, and in fact the proposal includes an example of when using the web-compatible name would be incorrect (XML documents).
And by the way, I'm not saying it isn't worth adding. I just think it's interesting and that we should consider to what extent we want Foundation APIs to formally reflect standards designed for the web.
I agree the extent to which Foundation should reflect standards is still an open question. But we do continue to add new API to support particular standards. For example the calendar recurrence rule was designed to work with RFC 5545
The HTTP date format was also designed so it works specifically for HTTP.
I'm not sure what should be formal, but now I think we have some good reasons for having APIs for IANA and WHATWG at least.
IANA is a standards organization that has been in existence for over 3 decades (owned by ICANN now though) and is in fact responsible for maintaining "charset" names. CF is based on IANA.
WHATWG provides de-facto standards that follow the ways in the real world (web browsers). Also the fact that one of members of WHATWG is Apple is important?
Anyway, we should leave NameType non-@frozen for some unseen standards in the future, maybe.
I think this proposal needs to more deeply consider the underlying encoding implementation of each encoding and also separate names from labels. E.g., while the WHATWG Encoding standard maps the "ascii" label to "windows-1252", there's no encoding whose name is "ascii" (or "ASCII").
And also what the stability guarantees are of that encoding implementation. E.g., much of the industry had to change the encoding WHATWG calls "gb18030" due to GB18030-2022 recently being issued. Are such changes possible in Foundation or would that warrant a new encoding implementation of sorts (and thus a new name)?
As another example: while both IANA and WHATWG define Shift_JIS, the intended meaning is quite a bit different.
Also, IANA tends to be very light on requirements of the actual encoding implementation, while WHATWG requires exact handling for any given input, including input that can be considered erroneous. Do the Foundation encoding implementations meet these requirements? If not, using WHATWG names or labels to identify them would be rather misleading.
I hope this helps you refine the proposal further.
I'm surprised and glad that the author of Encoding Standard took a look at this pitch. Thank you so much.
I think, in general, this kind of APIs are unavoidable to come to a compromise.
I mean any standards would be converted to "Foundation's standard" to some extent through such APIs.
In fact, even current CFStringConvertIANACharSetNameToEncoding(_:) returns CFStringEncoding instance that is the closest mapping, the document says.
Do you mean String.Encoding(whatwg: "ascii") should return nil?
Or should we explicitly describe exactly that it's a label such as String.Encoding(whatwgLabel: "ascii")?
(In case of WebKit, for example, PAL::TextEncoding's constructor takes a string-like object and is constructed as "windows-1252" when "ascii" is passed[1], IIUC.)
At least, in this pitch, we focus on currently available encodings.
Although it'll be hard for Foundation to reflect every change, we can catch up the standard as to limited encodings at the timing of minor/major releases.
...That may be my wishful thinking?
I agree that Shift_JIS is required to be treated specifically.
As mentioned in the pitch, String.Encoding.shiftJIS is historically derived from kCFStringEncodingDOSJapanese in CF (not from kCFStringEncodingShiftJIS), which means that it's better that .shiftJIS should be treated as "Windows-31J" rather than simple "Shift_JIS".
However, as you know, Windows-31J can be deemed as a variant of Shift_JIS in practice.
It's a tough question because String.init(data:encoding:) is now unfortunately broken. We may be able to consider the implementation when we fix it.
Big picture?
If we want strict correspondence between names and encoding/decoding implementations, we could get inspired by @Karl's suggestion:
However, for example, ISO-2022-JP would never fit to conform to Unicode.Encoding (right?).
Hence legacy encodings would need other protocols just like:
public protocol StringEncodingProtocol {
var name: String? { get }
static func encoding(from name: String) -> Self?
func encode<S>(_ string: S) throws -> Data? where S: StringProtocol
func decode<D>(_ data: D) throws -> String? where D: DataProtocol
}
public struct IANACharset: StringEncodingProtocol {
...
}
public struct WHATWGEncoding: StringEncodingProtocol {
...
}