SE-0425: 128-bit Integer Types

tera · March 6, 2024, 9:56pm

It could be the same reason "no demand for it" (the Int128 type was not available yet to generate any demand to make it Codable).

I'm not satisfied with "encode as a pair" default behavior (or, as we do in another place encode a dictionary as an array of pairs)... When you encode Int8 you'd at least get "something similar" (e.g. Int or Double) after encode → JSON → decode. With Int128 you'd get something "totally different" back (like an array of two ints). If there's no better solution I'd seriously consider making Int128 the "foundational" Codable type.

beccadax · March 6, 2024, 10:14pm

While that’s true, I think [U]Int128 is “primitive enough” that anyone writing a coder ought to consider how they want to represent it. That’s different from, say, Date, where the desire to control its coding is very specific to the goals of the Foundation coders, and we think the vast majority of coders would be perfectly happy to use the default behavior.

Or to put it another way, I think that if [U]Int128 had existed when SE-0166 was written, we would have included these methods, whereas we clearly decided not to do that for the types the Foundation coders are currently type-testing. I see no reason not to get as close to that ideal version of SE-0166 as we can.

The difference, I think, is that Float80 is inherently something really niche—it’s supported only because of historical quirks of Intel floating-point support, and it’s not supported at all on non-Intel CPUs. Int128 has a wider range than you usually need, but it still has many realistic applications that would benefit from coding (for instance, Swift.Duration ~~would probably~~ might have used Int128 if it had been available) and is supportable on all platforms.

tera · March 6, 2024, 10:43pm

Maybe not:

I do agree it's worth making Int128 Codable, just the proposed default method of doing so (a pair of 64bit integers) concerns me.

itaiferber · March 7, 2024, 1:21am

FWIW, [U]Int128 will round-trip through Codable regardless of the underlying representation. You'll only get back "something different" if you encode a [U]Int128 into a format that doesn't natively support it, then inspect the data using another tool, or ask to decode a type other than what you encoded.

Unfortunately, we can't add requirements to support [U]Int128 to the protocols without giving those methods default implementations, which necessitates some form of default representation. Even if we made these primitives, we'd need to pick a default.

Agreed!

scanon · March 7, 2024, 2:00am

@itingliu and I chatted about this, and we consider adding support to be a bug-fix for Foundation (i.e. we will simply do this once the type lands, no Foundation API review needed).

grynspan · March 8, 2024, 2:09pm

We do already have prior art here in the form of Duration's Codable conformance. It encodes its stored value (a 128-bit integer representing total attoseconds) as two 64-bit integers in an array, and the array is big-endian (the high 64 bits are the first element in the array.)

It seems to me, naïvely, that the Codable representation of Duration ought to simply equal the Codable representation of Int128 one the dust settles here, and that strongly implies that the order should be high-half-first. (There's a joke here about egg timers, but I can't quite crack it… )

Karl · March 8, 2024, 3:09pm

IMO, the obvious fallback encoding for 128-bit integer types is as a String.

Firstly, interoperability with other language ecosystems is an important factor for encoding and decoding, and I think it is more likely that external systems and bigint libraries will support parsing from a string, rather than constructing a value from a little-endian pair of 64bit integers.

Efficiency is probably less of a concern, and in the context of everything than an encoder and decoder does, I don't think integer parsing or serialisation are going to be significant. It feels like premature optimisation to me, although I invite the proposal authors to provide figures which substantiate their claims.

Of course, if users have a specialised use-case where these operations are significant even given the other encoding/decoding overheads, they can of course choose a specialised representation. Essentially, I'm flipping the "it is possible for users to achieve the same effect by converting to a string before encoding" argument on its head. I think the default, fallback solution should optimise for interoperability rather than performance.

Secondly, a string is better for human readability. I'll note that the most widely-used serialisation formats (such as JSON and XML) are human-readable. Even when a format has both binary and textual representations (such as Apple's plist format), it seems the textual representation is far more common. For instance, if I look inside ~/Library/Application Support/*, I see lots of textual plists and json files, and hardly any binary configuration files.

This is all anecdotal, of course, but it's interesting because this data isn't generally supposed to be directly changed by users in a text editor, and even interoperability isn't a significant concern (something like com.apple.wallpaper could switch to a binary format in an update, without breaking anybody). But it seems that, by a large margin, developers prefer human-readable formats regardless. I think they would appreciate a fallback encoding which also optimises for legibility.

tera · March 8, 2024, 3:27pm

I'd seriously consider encoding 128 bits natively (like we do for 64, 32 bits, etc). That we've introduces Codable first and Int128 type next (and not the other way around) shouldn't affect the resulting outcome. Even if other platforms don't currently support 128-bit integers (say, in JSON), Swift could establish the precedent to follow!

Strictly speaking 64-bit integer are not interoperable between platforms either (storing integers bigger than 2^53 in JSON is problematic) and we don't consider that a problem... We could do likewise for Int128 numbers.

wadetregaskis · March 8, 2024, 5:52pm

Is that a good assumption? The kind of code that uses 128-bit integers is quite possibly code that does some serious number crunching. While that might also mean they avoid JSON anyway, because it's inherently very inefficient, I don't think it's wise to potentially force them to because of 128-bit integers specifically.

Also, the goal should be to make 128-bit integers behave like any other integers. In the end. So having dramatically different coding methods - even if just in the interim - is not great.

There is the tangential consideration of whether decode(_:forKey:) should accept string forms for all numeric types.

Than a pair of integers that have to be bit-shifted and unioned, yes. Than a simple integer literal, obviously not (most of the bytes might be the same, but the surrounding quotes and the fact that it's a string type introduces questions about the semantics and encoder's intent).

But, I think we're in agreement that updated coders will simply support 128-bit integers natively; we're just discussing the temporary hack for existing, unaware coders?

This may be a minority opinion, but I'd push back a little on the notion that JSON is particularly human-readable. It can be, if schemas are carefully designed that way and the problem domain happens to be amenable to it, but frankly most JSON I see is about as human-readable as C++ generic types. Human-readability is about a lot more than what character set you use.

I think that's a pertinent point. JSONEncoder et al don't magically shift to a string representation (or whatever) at that 2^53 threshold, despite JavaScript's insane design flaw. And JSON is certainly much bigger than JavaScript now, in any case.

Karl · March 8, 2024, 6:50pm

IMO, when considering a type's encoded representation, the use of the type itself is not the most relevant consideration. If you're doing serious number crunching, you're not going to want to decode integers on-the-fly; you're going to want to perform that work in advance in a separate pipeline step, have your numbers all decoded, and then crunch them.

So the most relevant consideration is that decoding pipeline step. Set against all the other typical overheads that decoding using Decoder typically involves, is there a significant performance difference between decoding a string or a pair of ints? My intuition says that it won't be significant, but no data has been presented either way.

I'm not sure about calling it a "temporary hack", but yes. Let's say you use a package such as XMLCoder, CodableCSV, or Yams, and you want to encode a model type using a 128-bit integer.

Eventually those libraries may be updated with native support for 128-bit integers (or not), but until they are, your model data still needs to be encoded using some combination of existing primitives.

Another thing that is worth considering related to this is that if/when they are updated, they will need to support decoding from both their native format and this fallback representation. I think we should probably work through what that is going to look like for them, and add a separate method to make it easier if necessary. I think they're going to have to implement methods such as UnkeyedDecodingContainer.decode(_ type: Int128.Type) to implement their native formats, but also call in to the protocol's default implementation to support the fallback.

I know it can be awkward to call a protocol's default implementation when you also have a custom implementation. So we should think about that.

It is certainly more human-readable than a binary format! But people can always argue about whether complex concepts are presented in an easily digestible manner. But if that's the metric, many of the world's most important literary works, from Plato's Republic to Das Kapital, would barely qualify as human-readable.

There is no generic concept of a native encoding for 128-bit integers (or anything else for that matter). That is something that encoders/decoders can do by implementing the requirements @beccadax suggested.

It may be that Foundaton's JSONEncoder chooses the approach you've described, but you'd need to ask the Foundation maintainers. They may also allow various approaches using configuration options, as they do for date encoding.

tera · March 9, 2024, 12:54am

Could you expand on this please. What stops us doing the 128-bit int Codable support by literally copy/pasting what we do for 64-bit ints and changing 64 to 128?

I understand that there will be bits and pieces here and there that will lag initially (like Foundation), but after solving initial hurdles we would get true Codable support for 128-bits, as native as for 64-bits.

wadetregaskis · March 9, 2024, 4:57am

The default implementation can only be built atop the existing API (e.g.). The existing API only knows about integers up to 64 bits, doubles, floats, strings, and nil, thus the various suggestions to compose it from those.

The codable APIs don't provide a way to encode an integer of arbitrary width, or even an intrinsic byte array (you can encode e.g. Array<UInt8>, but that's going to be e.g. in JSON [x, y, z], not the bitwise-equivalent integer n).

Karl · March 9, 2024, 10:05am

Another idea: the default implementations of decode(_ type: Int128.Type), etc could just throw an error. Actually I think that is probably the best thing to do; even better than using a String.

Additionally, we could add a warning if a conformance (such as those in Yams or XMLCoder) doesn't implement those requirements. Their code would still build and the protocol evolution would be ABI stable thanks to the presence of the default implementation, but the warning would encourage them to implement the methods regardless.

That way, we avoid having to define Swift-specific fallback encodings, and avoid this migration issue entirely:

tera · March 9, 2024, 12:31pm

Do we have default implementation for, say, Int32? Could you please point me to the source?

And we can't just add

func encode(UInt128, forKey: KeyedEncodingContainer<K>.Key) throws

to the existing API? Ever?

I like this.

wadetregaskis · March 9, 2024, 3:20pm

tera:

wadetregaskis:

The default implementation

Do we have default implementation for, say, Int32? Could you please point me to the source?

wadetregaskis:

The default implementation can only be built atop the existing API (e.g. ). The existing API only knows about integers up to 64 bits, doubles, floats, strings, and nil

And we can't just add
func encode(UInt128, forKey: KeyedEncodingContainer<K>.Key) throws
to the existing API? Ever?

It is binary-breaking (and source-breaking) to add to a protocol's requirements, because existing code will not be compliant (which is, I think, basically undefined behaviour in Swift - claiming to conform to a protocol yet not actually implementing all its methods could result in who knows what when something tries to call one of the missing methods).

Adding a new method with a default implementation is a way around that, because the default implementation allows existing code to technically still conform (even if the default implementation does nothing, or throws an exception).

Having the default implementation just throw an error is somewhat appealing:

It side-steps debate over how it should encode the 128-bit integer…
- …and eliminates the need to forever-after support reading that hacky "temporary" representation.
Makes it much more likely that custom coders will actually support 128-bit integers properly (rather than just ignoring them and continuing to use the default encoding, which is always going to be suboptimal and is not the intent of having a default encoding).

Would it ever not be able to throw an error, though? i.e. can the default implementation one day be removed? Because permanently having a protocol method that unexpectedly throws exceptions if you forget to override it - which the compiler & IDE won't encourage you to do to, so easy to overlook - will likely be a recurring pain point for not just coder implementors but also codable users.

benrimmington · March 9, 2024, 4:01pm

C++ proposal by Jan Schultke:

benrimmington · March 9, 2024, 4:11pm

Would the _BitInt(128) type be layout-compatible on big-endian platforms?

The following documents suggest that larger _BitInt types are stored as a little-endian array of native-endian chunks:

tera · March 9, 2024, 4:35pm

"Source breaking" is understandable (the code that's getting recompiled with new Swift version will have to add the required implementation. (Unfortunately Swift doesn't have optional protocol methods unlike Obj-C †).

Re: "Binary breaking" – could this be done in a way that doesn't actually break anything? Say, the old binary could use the old Swift runtime (that won't call the new method) and the new binary will use the new Swift runtime (that will call the new methods).

† - Optional protocol methods make Obj-C much better suited to making API changes in a backward compatible manner. Take URLSessionTaskDelegate for example which was revised and added new methods at least 5 times! Why won't we consider having optional protocol methods in Swift?

Karl · March 9, 2024, 5:57pm

We can make it a warning if an Encoder/Decoder fails to implement the requirement. Since it's only a warning, packages such as Yams and XMLCoder will continue to build (source compatible), and since there is some kind of default implementation, existing copies in binary dependencies or installed on end-devices will continue to work as users update their OS (binary compatible).

But the authors of those packages will also have a strong signal ensuring they do not forget to implement the method.

It may also be possible to upgrade the warning to an error in some circumstances where we know the package is being updated (such as when compiling in Swift 6 mode).

itaiferber · March 9, 2024, 11:09pm

A wrinkle in this discussion: adding [U]Int128 methods, in this particular instance, would actually not be source-breaking, due to the generic catch-all encode<T>(...) methods.

Swift allows generic methods to fulfill protocol requirements for concrete types, such that the following compiles:

protocol P {
    func f(_ i: Int)
    func f<T>(_ t: T)
}

struct S: P {
    func f<T>(_ t: T) {
        print(T.self)
    }
}

S().f(42) // => Int

The generic S.f<T> satisfies both protocol requirements.

It is, as such, entirely possible to implement Encoders and Decoders that only implement the single generic requirement on the various types. The reason the concrete overloads are actually specified in the first place ("unnecessarily") is to document the primitive types that Encoders and Decoders are expected to be able to support for consumers, at bare minimum.

goes against this: the protocol requirement is intended to tell someone writing a Codable type that "you can rely on any Encoder and Decoder being able to handle [U]Int128", except the default implementation would just throw, with no recourse at compile time.

This source-compatibility is also why

would be somewhat doubly-tricky. It's both a little awkward to special-case this in the compiler, and the change is not technically source-breaking.

No, whatever default implementation is chosen can't ever be safely removed for exactly those ABI-breaking concerns. There's nothing stopping someone from writing an Encoder today, compiling it into a binary framework, and linking it to their executable; the point of ABI compatibility is that their framework continues to work into the future, and if the default implementation they were silently benefiting from went away, that would remain an ABI break for consumers.