Introducing base64 encoding and decoding options to support the base64url alphabet as defined in RFC4648 and to allow the omission of padding characters.
Motivation
Foundation offers APIs to encode data in the base64 format and to decode base64 encoded data. Multiple RFCs that define cryptography for the web use the base64url encoding and strip the padding characters in the end. Examples for this are:
Since Foundation is not offering an API to support the base64url alphabet and omitting the padding characters, users create wrappers around the existing APIs using replacingOccurrences(of:, with:). While this approach works it is very inefficient. The data has to be iterated three times, where one time could have been sufficient.
Solution
We propose to add additional options to Data.Base64EncodingOptions:
extension Data.Base64EncodingOptions {
/// Use the base64url alphabet to encode the data
@available(FoundationPreview 6.2, *)
public static var base64URLAlphabet: Base64EncodingOptions { get }
/// Omit the `=` padding characters in the end of the base64 encoded result
@available(FoundationPreview 6.2, *)
public static var omitPaddingCharacter: Base64EncodingOptions { get }
}
Simultaneously we will add the same options to Data.Base64DecodingOptions and an additional ignoreWhitespaceCharacters option. Please note that we show the existing ignoreUnknownCharacters option in the code snippet below, as we intend to change its documentation to better explains its tradeoffs.
extension Data.Base64DecodingOptions {
/// Modify the decoding algorithm so that it ignores unknown non-Base-64 bytes, including line ending characters.
///
/// - Warning: Using `ignoreUnknownCharacters` might allow the decoding of base64url data, even when the
/// `base64URLAlphabet` is not selected. It might also allow using the base64 alphabet when the
/// `base64URLAlphabet` is selected.
/// Consider using the `ignoreWhitespaceCharacters` option if possible.
public static let ignoreUnknownCharacters = Base64DecodingOptions(rawValue: 1 << 0)
/// Modify the decoding algorithm so that it ignores whitespace characters (CR LF Tab and Space).
///
/// The decoding will fail if any other invalid character is found in the encoded data.
@available(FoundationPreview 6.2, *)
public static var ignoreWhitespaceCharacters: Base64EncodingOptions { get }
/// Modify the decoding algorithm so that it expects base64 encoded data that uses base64url alphabet.
@available(FoundationPreview 6.2, *)
public static var base64URLAlphabet: Base64EncodingOptions { get }
/// Modify the decoding algorithm so that it expects no padding characters at the end of the encoded data.
///
/// The decoding will fail if the padding character `=` is used at the end of the encoded data.
///
/// - Warning: This option is ignored if `ignoreUnknownCharacters` is used at the same time. Consider
/// using `ignoreWhitespaceCharacters` if possible.
@available(FoundationPreview 6.2, *)
public static var omitPaddingCharacter: Base64EncodingOptions { get }
}
Big +1 from me — my codebase has no less than 4 public implementations of this from various packages, and we would do well to simplify them.
My only addition (from a pitch I never managed to put together myself) would be to extend base64 encoding and decoding to DataProtocol and/or ContiguousBytes (I didn't look into the which would be better) — currently swift-webpush and swift-webauthn perform these operations on [UInt8], and I imagine other bytes-like types would benefit immensely from moving these methods over to those.
Finally, it would be great if we can figure out a way to make these back deployable, because although the server ecosystem doesn't care much, clients that interact with them will otherwise need to keep around yet another copy of these implementations for quite some time.
Oh absolutely — please consider it for the future directions section because converting to Data to convert to base64 is one extra unecessary copy that will stay otherwise.
From what I managed to work out on my own, it depends on how we encode the new options — if they are ints or strings under the hood, we can likely inline the new call itself which will back deploy the functionality, but looking into it further is what stopped me from making my own pitch in this regard haha
I mentioned this elsewhere on the forum recently, but I think Foundation should soon add parallel API for Span in most places we use Data today. I can easily imagine an extension on Span<UInt8> to produce a base64 String, for example. Or an extension on the eventual mutable Span type to fill it with base64-encoded bytes.
Also, the existing option ignoreUnknownCharacters uses plural "characters", and the proposed is omitPaddingCharacter with singular "character". Can we be consistent there unless there is only one padding character?
Also, I just noticed this here -- these options are declared as static let on non-Darwin platforms, like you mentioned
public static let ignoreUnknownCharacters = Base64DecodingOptions(rawValue: 1 << 0)
But I think on Darwin they're computed properties according to the documentation
static var ignoreUnknownCharacters: NSData.Base64DecodingOptions { get }
I'm open to change it to .urlAlphabet. However I wonder how we would name the default alphabet, if users want to be explicit? Do we want the option for users to be explicit here?
There is only one padding character. Do we want to change the decoding option name to be more explicit, that we will fail the decoding if the padding character is present. What would you recommend here?
I think docc auto converts static lets to static vars in the documentation, as this is the pure API view. The existing options are defined as static lets.
Ah, yes there's indeed only one padding character, which is "=". The singular "character" makes sense. Thanks.
While I'm with you that omit sounds a little bit permissive rather than the strict requirement, I'm ok as it already uses a different name from the other "ignore"-prefixed name.
I initially had the same thought as well. However, then I saw that "Base64 URL" (sometimes spelled "Base64URL," "Base64-URL," or "Base64_URL") is basically what everyone calls it. Seems weird to split that name down the middle?—feels kind of like calling Times New Roman just "New Roman" or iPhone 16e just "e."
I'd like to use this pitch as an abbreviated review and accept it given it's a fairly simple addition. I'll defer to @fabianfett for the final naming choice (that is, if he wants to change them later).