[Pitch] Adding base64 urlencoding and omitting padding options to base64 encoding and decoding

fabianfett · February 4, 2025, 9:24pm

Hi folks. What do you think about the pitch below? I'd love to hear your feedback. Thanks!

Adding base64 urlencoding and omitting padding option to base64 encoding and decoding

Proposal: SF-NNNN
Authors: Fabian Fett
Review Manager: TBD
Status: Awaiting implementation

Revision history

v1 Initial version
Fixed base64URLAlphabet capitalization.

Introduction

Introducing base64 encoding and decoding options to support the base64url alphabet as defined in RFC4648 and to allow the omission of padding characters.

Motivation

Foundation offers APIs to encode data in the base64 format and to decode base64 encoded data. Multiple RFCs that define cryptography for the web use the base64url encoding and strip the padding characters in the end. Examples for this are:

Since Foundation is not offering an API to support the base64url alphabet and omitting the padding characters, users create wrappers around the existing APIs using replacingOccurrences(of:, with:). While this approach works it is very inefficient. The data has to be iterated three times, where one time could have been sufficient.

Solution

We propose to add additional options to Data.Base64EncodingOptions:

extension Data.Base64EncodingOptions {
    /// Use the base64url alphabet to encode the data
    @available(FoundationPreview 6.2, *)
    public static var base64URLAlphabet: Base64EncodingOptions { get }

    /// Omit the `=` padding characters in the end of the base64 encoded result
    @available(FoundationPreview 6.2, *)
    public static var omitPaddingCharacter: Base64EncodingOptions { get }
}

Simultaneously we will add the same options to Data.Base64DecodingOptions and an additional ignoreWhitespaceCharacters option. Please note that we show the existing ignoreUnknownCharacters option in the code snippet below, as we intend to change its documentation to better explains its tradeoffs.

extension Data.Base64DecodingOptions {
    /// Modify the decoding algorithm so that it ignores unknown non-Base-64 bytes, including line ending characters.
    /// 
    /// - Warning: Using `ignoreUnknownCharacters` might allow the decoding of base64url data, even when the 
    ///            `base64URLAlphabet` is not selected. It might also allow using the base64 alphabet when the
    ///            `base64URLAlphabet` is selected.
    ///            Consider using the `ignoreWhitespaceCharacters` option if possible.
    public static let ignoreUnknownCharacters = Base64DecodingOptions(rawValue: 1 << 0)

    /// Modify the decoding algorithm so that it ignores whitespace characters (CR LF Tab and Space). 
    ///
    /// The decoding will fail if any other invalid character is found in the encoded data. 
    @available(FoundationPreview 6.2, *)
    public static var ignoreWhitespaceCharacters: Base64EncodingOptions { get }

    /// Modify the decoding algorithm so that it expects base64 encoded data that uses base64url alphabet.
    @available(FoundationPreview 6.2, *)
    public static var base64URLAlphabet: Base64EncodingOptions { get }

    /// Modify the decoding algorithm so that it expects no padding characters at the end of the encoded data.
    ///
    /// The decoding will fail if the padding character `=` is used at the end of the encoded data.
    /// 
    /// - Warning: This option is ignored if `ignoreUnknownCharacters` is used at the same time. Consider 
    ///            using `ignoreWhitespaceCharacters` if possible.
    @available(FoundationPreview 6.2, *)
    public static var omitPaddingCharacter: Base64EncodingOptions { get }
}

Impact on existing code

None. This is an additive change.

xwu · February 4, 2025, 9:34pm

Nit: by Swift naming conventions, this would be capitalized as base64URLAlphabet.

fabianfett · February 4, 2025, 9:48pm

@xwu Good call. Fixed.

dimi · February 4, 2025, 9:52pm

Big +1 from me — my codebase has no less than 4 public implementations of this from various packages, and we would do well to simplify them.

My only addition (from a pitch I never managed to put together myself) would be to extend base64 encoding and decoding to DataProtocol and/or ContiguousBytes (I didn't look into the which would be better) — currently swift-webpush and swift-webauthn perform these operations on [UInt8], and I imagine other bytes-like types would benefit immensely from moving these methods over to those.

Finally, it would be great if we can figure out a way to make these back deployable, because although the server ecosystem doesn't care much, clients that interact with them will otherwise need to keep around yet another copy of these implementations for quite some time.

fabianfett · February 4, 2025, 9:58pm

While I do agree that this is an issue worth tackling, I don't think that this feature request should interfere with what we propose today.

cc @Tony_Parker Is that possible with the new shared Foundation?

j-f1 · February 4, 2025, 9:58pm

Should it be spelled .urlAlphabet since we’re already in the context of base 64 encoding?

dimi · February 4, 2025, 10:02pm

Oh absolutely — please consider it for the future directions section because converting to Data to convert to base64 is one extra unecessary copy that will stay otherwise.

From what I managed to work out on my own, it depends on how we encode the new options — if they are ints or strings under the hood, we can likely inline the new call itself which will back deploy the functionality, but looking into it further is what stopped me from making my own pitch in this regard haha

Tony_Parker · February 5, 2025, 9:08pm

I mentioned this elsewhere on the forum recently, but I think Foundation should soon add parallel API for Span in most places we use Data today. I can easily imagine an extension on Span<UInt8> to produce a base64 String, for example. Or an extension on the eventual mutable Span type to fill it with base64-encoded bytes.

itingliu · March 4, 2025, 5:12pm

I have the same question here.

Also, the existing option ignoreUnknownCharacters uses plural "characters", and the proposed is omitPaddingCharacter with singular "character". Can we be consistent there unless there is only one padding character?

Also, I just noticed this here -- these options are declared as static let on non-Darwin platforms, like you mentioned

public static let ignoreUnknownCharacters = Base64DecodingOptions(rawValue: 1 << 0)

But I think on Darwin they're computed properties according to the documentation

static var ignoreUnknownCharacters: NSData.Base64DecodingOptions { get }

What are your thoughts on this discrepancy?

fabianfett · March 4, 2025, 5:23pm

I'm open to change it to .urlAlphabet. However I wonder how we would name the default alphabet, if users want to be explicit? Do we want the option for users to be explicit here?

There is only one padding character. Do we want to change the decoding option name to be more explicit, that we will fail the decoding if the padding character is present. What would you recommend here?

itingliu:

Also, I just noticed this here -- these options are declared as static let on non-Darwin platforms, like you mentioned
public static let ignoreUnknownCharacters = Base64DecodingOptions(rawValue: 1 << 0)
But I think on Darwin they're computed properties according to the documentation
static var ignoreUnknownCharacters: NSData.Base64DecodingOptions { get }
What are your thoughts on this discrepancy?

I think docc auto converts static lets to static vars in the documentation, as this is the pure API view. The existing options are defined as static lets.

itingliu · March 4, 2025, 5:48pm

Ah, yes there's indeed only one padding character, which is "=". The singular "character" makes sense. Thanks.

While I'm with you that omit sounds a little bit permissive rather than the strict requirement, I'm ok as it already uses a different name from the other "ignore"-prefixed name.

xwu · March 4, 2025, 6:21pm

I initially had the same thought as well. However, then I saw that "Base64 URL" (sometimes spelled "Base64URL," "Base64-URL," or "Base64_URL") is basically what everyone calls it. Seems weird to split that name down the middle?—feels kind of like calling Times New Roman just "New Roman" or iPhone 16e just "e."

j-f1 · March 4, 2025, 6:50pm

var options: Data.Base64DecodingOptions = getOptionsFromSomewhereElse()
options.remove(.urlAlphabet)
print(data.base64EncodedString(options: options))

itingliu · March 7, 2025, 5:09pm

I'd like to use this pitch as an abbreviated review and accept it given it's a fairly simple addition. I'll defer to @fabianfett for the final naming choice (that is, if he wants to change them later).

Oliver_Jones · June 1, 2025, 11:23pm

I notice this is still marked awaiting implementation. Has there ben any progress on implementing this in the StdLib?

itingliu · June 2, 2025, 10:59pm

The proposal file hasn't been updated yet, but @fabianfett merged the implementation in Reapply #1160, Add fix to allow more padding characters than necessar… · swiftlang/swift-foundation@26391c9 · GitHub

j-f1 · August 14, 2025, 11:41pm

I was just looking to convert a Data to a base64-url string. It seems like only some of the things in this proposal got included in Swift 6.2’s version of Foundation. What’s the status of the base64url support and the other features pitched here?

itingliu · August 20, 2025, 3:52pm

Sorry, I spoke too soon.

We did accept the proposal, but the implementations are still in review. We’ll need to update the availability to 6.3