SE-0425: 128-bit Integer Types

beccadax · March 9, 2024, 11:16pm

The nature of a serialization library is that you never, ever want to remove support for a format you’ve accepted in the past, lest you lose the ability to read old files.

tera · March 9, 2024, 11:47pm

Right. And what @Karl and @wadetregaskis suggests is to not have anything serialised on disk (because the error would be thrown before that could happen). There is no problem of 128-bit Codable compatibility simply because we didn't have Int128 type before.

tera · March 10, 2024, 2:48am

If we do no changes to Codable, what would happen to a typical encoding/decoding implementation when you feed it Int128 type it knows nothing about? encode<T> is called and then?

wadetregaskis · March 10, 2024, 5:33am

I could be wrong, but I don't think it works that way. Note the full method signature:

mutating func encode<T>(
    _ value: T,
    forKey key: KeyedEncodingContainer<K>.Key
) throws where T : Encodable

You can only encode Encodable types, not any type. And what does it mean to be Encodable? It means the type implements this method:

func encode(to encoder: any Encoder) throws

That's what encode<T>(_:forKey:) calls. It's part of the recursive nature of codable. You keep drilling down through higher-level types until you reach the limited set of primitive types that are fundamentally supported. Ultimately, the only "native" types are the ones explicitly catered for in the APIs for KeyedEncodingContainer and friends:

String.
Integers up to 64-bit.
Double & Float.
Bool.
nil
PredicateExpressions (for some reason¹).

Thus this discussion about which of those primitives to use for the default implementation when native 128-bit integer support is absent.

But the more I think about it, the more I think an exception is the way to go, especially in light of @Karl's assertions (if accurate):

¹ I guess PredicateExpression must be special in some way, in order to require this bespoke support rather than just conforming to Encodable like all other non-primitive types. I don't know much about it, so perhaps it's unsurprising that it's not apparent to me why it's special in this regard.

ole · March 10, 2024, 10:13am

I don't think they are special. The encodePredicateExpression variants are extensions defined in Foundation on the concrete encoding/decoding container structs (e.g. KeyedEncodingContainer). (They couldn't be in the stdlib if they wanted to because they refer to types defined in Foundation.)

These methods are not requirements of the respective container protocols, though (e.g. KeyedEncodingContainerProtocol).

Karl · March 10, 2024, 3:05pm

I don't think it is. What I'm picturing is an attribute, @_warnIfNotImplemented(message), which could be attached to the protocol requirement. Typically, it is already an error not to implement a protocol requirement, unless that requirement has a default implementation. Implementation-wise, I can't see this being particularly challenging.

And conceptually, I think it would be a useful general language feature (although I'm not suggesting this proposal expands to officially introduce it; it would be an implementation detail).

I think it is reasonable that a library starts out with a protocol, and then some time later wants to evolve it in some way. Their desire not to break existing clients leads them to introduce default implementations which they would otherwise rather not do.

I think it would be great if they could tag that added requirement to say, "hey - even though you're not formally required to implement this due to the presence of the default, that default is far from ideal so I would strongly prefer that you add an explicit witness".

And at their next opportunity to introduce breaking changes (e.g. the next SemVer-major update), they might remove that default implementation entirely.

taylorswift · March 10, 2024, 10:09pm

for what it’s worth, the alternative you are alluding to has a name - BSON - and it does not support 128-bit integers. so if you want to persist 128-bit integers in a straightforward manner, JSON is not just an option; it is your only choice.

there exists a vastly superior alternative to both formats, Amazon ION, which not only supports 128-bit integers, but entirely arbitrary-precision integers. but there is no Swift implementation of Amazon ION. so we are stuck with JSON.

tera · March 10, 2024, 10:22pm

Does JSON support 128-bit integers? I thought it only supports 64 bit integers with the caveat that anything over 53-bit is non portable (as it stands).
Anyways, BSON, MessagePack, JSON could be enhanced to support 128-bit integers and the mentioned Amazon ION could be ported to Swift, or used via C interop, so I won't say we are "stuck" (well, maybe in the short term we are).

taylorswift · March 10, 2024, 10:27pm

JSON supports fully arbitrary-precision integers, all limitations on integer precision in JSON are ultimately limitations of individual decoder implementations.

ION support for Swift (even through C bindings) has been on my wishlist for a couple of years now, so “short term” could be a while.

tera · March 11, 2024, 12:26am

You are right.

I didn't try that yet. Are there problems wrapping it into some passthrough Objective-C wrapper? Should be mechanical and simple.

IIRC both BSON and MessagePack support custom data, which could be used to encode types these encoders don't (currently) know about. Once Int128 is finally publicly available those libraries will be updated to support it natively within a reasonably short time.

taylorswift · March 11, 2024, 1:58am

i don’t anticipate any blockers to doing that, it’s just a task that could be done but hasn’t been done yet. it might make an interesting GSoC project.

itaiferber · March 12, 2024, 1:53pm

Your analysis is accurate, but either some part is missing, or we may be talking about slightly different things — and I should have been clearer that my encode<T>(...) shorthand was intended to stand for the full generic encode<T: Encodable>(_: T)/decode<T: Decodable>(_: T.Type) methods and their forKey: variants. (Below here, too, where I'm being sloppy.)

To expand — all of the primitive types themselves conform to Encodable and Decodable, for a few reasons:

To allow them to be passed into generic context that have an Encodable or Decodable constraint
To enable their use within conditional conformances; e.g. Array: Encodable where Element: Encodable, so UInt8: Encodable allows [UInt8]: Encodable
To enable types which contain primitive type properties to go through compiler synthesis of Encodable and Decodable without also needing to hard-code the list of primitive types into the compiler

What I was referring to is that this all means that for any primitive type, there are two valid overloads of encode(...)/decode(...) which can get called: the one taking a concrete type (e.g. encode(_: Int)), and the one taking a generic type (encode<T>(_: T)). The crux is that the generic method is allowed to fulfill the protocol requirement for the concrete method, meaning that, e.g., your implementation of SingleValueEncodingContainer can, in fact, implement the single

func encode<T: Encodable>(_ value: T) throws

method and satisfy all of the protocol requirements without offering individual implementations of encode(_: Int), encode(_: Double), encode(_: String), etc. (Theoretically, this can be useful if you despise writing the overloads and would prefer to switch on T inside of the generic variant, and handle encoding of all of those types in one place.)

Aside

The tricky business is that you have to remember to handle all of the primitive types, because for types you don't explicitly handle, you typically call encode(to: any Encoder) throws on — and the implementations for these types all fetch a SingleValueEncodingContainer and call encode(self) right back again, which would lead to an infinite loop.

[U]Int128 falls into this story in an interesting place. Even though we have to do this for ABI stability reasons, if we added concrete encode(_: [U]Int128) method requirements to the protocols, adopted [U]Int128: Codable, and didn't offer default implementations, your Encoder and its requisite types would continue to compile just fine, because the existing generic implementation would technically fulfill the protocol requirement, even if your implementation didn't.

This is actually quite appealing, and I think would be appealing to attach to all of the Encoder/Decoder types protocol requirements, for the mess of reasons mentioned above. Currently, you get no help from the compiler in ensuring that you've satisfied all of the protocol requirements, and the only way you're going to catch that is at runtime (possibly painfully).

I do still see this as orthogonal to deciding whether the default implementation throws or picks a default, but I think it can't hurt.

And on that topic, one last note and I'll drop it: in practice, I think very few encoding formats natively support 128-bit integers, and can offer meaningful implementations for them beyond encoding [high, low]. JSON as a format offers no constraints on numbers, but in practice, >53-bit integers are iffy depending on where your data is going. (Round-tripping through Codable is safe, but sending anywhere else will depend on the service you're talking to.) I definitely see the appeal of saying "we shouldn't pick a default on behalf of Encoders", but I think in practice, it leaves most Encoder authors to make pretty much the same choice, largely without guidance. I personally think it's better to offer a guarantee to Codable consumers that yes, any Encoder can handle 128-bit integers in some form if you've got them, and for the minority of formats that do have native 128-bit integer support, it's not significantly more work to implement the requirement (even if you have to support both encoding forms).

wadetregaskis · March 12, 2024, 5:13pm

Yes, the primitive types do conform to Codable, for the reasons you mention. However, I don't see how that impacts the question of introducing new primitive codable types.

I'm just going to reiterate the process in a bit more detail, for others that are following along and might be interested.

For the primitive codable types, their Codable conformance is just to call the relevant primitive-encoding methods on Encoder & Decoder, e.g. for UInt64:

extension UInt64: Codable {
  public init(from decoder: any Decoder) throws {
    self = try decoder.singleValueContainer().decode(UInt64.self)
  }

  public func encode(to encoder: any Encoder) throws {
    var container = encoder.singleValueContainer()
    try container.encode(self)
  }
}

They don't actually know how to serialise themselves, of course, because they have no idea what the serialisation format is - that's defined by the Encoder / Decoder pair.

Note that whether the container's encode method is fully generic or relies on concrete overloads is irrelevant - either way it promises it will serialise those primitive values intrinsically.

As opposed to e.g. Optional's Codable conformance, which doesn't have a corresponding fundamental encoding method on Encoder / Decoder, so it has to define its own procedure using only the primitive types (pertinently, nil in its case) and the ability to ask the wrapped type to do the same. It cannot use a generic (for any Encodable) encode method on its whole self, because that will usually (if not always) turn right back around and call Optional's encode method again.

itaiferber:

The crux is that the generic method is allowed to fulfill the protocol requirement for the concrete method, meaning that, e.g., your implementation of SingleValueEncodingContainer can, in fact, implement the single
func encode<T: Encodable>(_ value: T) throws
method and satisfy all of the protocol requirements without offering individual implementations of encode(_: Int), encode(_: Double), encode(_: String), etc.

Right.

Looking at an example, JSONEncoder… it's quite complicated, but the real core of it is __JSONEncoder (in the new new Foundation), which spells out exactly what the primitive types are and how it represents them internally when building the JSON data model. Note how it has its core wrap overloads for only the primitive value types (per Encoder's delineation thereof) - there is no generic form at this most fundamental layer. It has a special set of methods, the wrapGenerics, for input existentials which handles all types that conform to Encodable, but notice how it manually unboxes those and redirects them to the appropriate concrete wrap methods - or, if the type in the box is not one of those primitives, it calls the type's own encode(_ encoder: Encoder) method.

As @itaiferber alluded to.

(tangentially, notice how it has special-case handling of Date, Data, and Dict, even though those types have their own fully-functional Codable implementations in terms of the primitive types, e.g. Data's, because the JSON coder believes it can do a better job of those given its specific knowledge of the JSON format)

There is simply no way to hand an arbitrary type to an Encoder / Decoder and have it magically serialised. You can either use the supported primitive types, which are intrinsically serialisable by the encoder / decoder, or you can use a type which is Encodable / Decodable and knows how to serialise / deserialise itself in terms of the primitive types.

So for [U]Int128, it's only choices are:

Not add itself as a primitive type, and instead implement its Codable conformance in terms of only the existing primitive types, e.g. strings, pairs of 64-bit integers, etc.
Add itself as a supported primitive type to the coding abstract types.
- But to do so there must be default implementations of the new methods, which either effectively do the same as for (1) or throw an exception, expecting concrete encoder & decoder implementations to override them.

Is that essentially what you were trying to point out, @itaiferber?

This part I don't understand. As you noted in your 'Aside' moments before it, if you did do it this way you'd crash anytime you try to encode or decode a [U]Int128, because it'd go into an infinite loop and inevitably exhaust the stack space. There has to be an actual concrete implementation somewhere, that actually serialises / deserialises [U]Int128 - or at least throws an exception - in order to prevent the infinite recursion.

Do we, here in this discussion thread on this forum, really have the authority to make that decision for all serialisation formats ever?

Sure, each serialiser can override any default implementation, in order to natively encode 128-bit integers, but if it ever forgets, even just one time, now it's stuck supporting that default implementation forever. That's a pretty sharp edge.

As you yourself noted, arguably [U]Int64 is invalid as a supported primitive type for JSON encoders, because it's not a supported type in JavaScript, yet we have it. And many serialisation formats work with it just fine (including JSON, as long as you don't actually touch it with JavaScript).

Logically, it only makes sense to provide a non-throwing default implementation if we provide the same for all integer types (because they're all in the same conceptual class, so what works for one should work for any). e.g. encoder not have native 64-bit integer support? No worries, just don't implement it and the default implementation will break it down into a 32-bit integer pair. Don't have 32-bit integers either? No worries, they'll break down further into a pair of 16-bit integer pairs. Etc. Down to, what, strings, I guess?

(what if your encoder doesn't natively support strings - should the default string encoding break it down into an array of UInt8?)

I think the only sensible option is to defer these decision to the encoder implementations themselves. If they want to serialise as strings instead, or pairs of other numbers, so be it. But it has to be a conscious choice; they're the owners of their serialisation formats.

(and if they choose to not support 128-bit integers, then their users will have to not use them, which maybe is the right choice; maybe the serialisation format really has no inherent "right" way to handle 128-bit integers and it's better left to the case-specific user, not the general serialiser, to decide how to work around that)

tera · March 12, 2024, 5:45pm

We could treat Int128's as we treat Int64: less than 53 bits? encode normally. greater than 53 bits? it will be "iffy" for both Int128 and Int64 – so UB or trap or crash, etc. Or would you prefer "always right" round-tripping for 128-bits but not for 64 bits?

itaiferber · March 12, 2024, 11:18pm

Yeah, I think this branch of the conversation hasn't quite been as productive as intended; this was all to point out that some of the changes here could end up being ABI-breaking but source-compatible, so that we'd both (a) need to watch out for them, and (b) need to do extra work if we did want to effectively want to alert Encoder/Decoder authors about the change. (Not that they would be useful/reasonable to do/etc.) I was initially concerned about doing the latter, but @Karl's @_warnIfNotImplemented seems like a clean way to fix this and some of the other problems that exist around the overloads in this API.

Indeed! That summarizes it well.
(For completeness: we can also decide to punt on this whole conversation, not adopt Codable at all, and let clients figure it out. Not suggesting we do this, but it is an option.)

I think it comes down to prioritizing between the conflicting requirements we have here. We intentionally didn't provide default implementations for any of the primitive types, to force Encoders and Decoders to make decisions in order to support all of those types, even if they had to make concessions to do so. The goal was to provide a set of types that Codable consumers could be guaranteed support for, so they could then use them as building blocks for their own types.

This is absolutely an argument for consistency: we should also not provide a implementation for [U]Int128 to force Encoders and Decoders to do the same for these new types.

It is also simultaneously the opposite argument for consistency: we should provide an implementation for [U]Int128 to provide that same guarantee for Codable consumers.

In general, Codable was designed with API consumers in mind: that the APIs should be useful out-of-the-box, even if it meant more work for Encoder/Decoder authors. (Which anyone who has written an Encoder or Decoder likely knows.) Hence my preference for the latter approach, even at the cost of purity, or additional work for Encoder/Decoder authors having to support multiple formats forever.

But, this is all personal opinion! I don't need to be convinced of anything, nor does my opinion hold any weight — it's up to the proposal authors to decide on what they think is appropriate, and for the appropriate language groups to accept.

itaiferber · March 12, 2024, 11:34pm

To be clear about this point specifically: any value should round-trip through Codable with no loss of precision. If you write an Encoder that can output a 64-bit value, your Decoder better be able to read that 64-bit value exactly.

The point I was trying to make was a little bit different: interop with other non-Codable destinations can already be a little tough to get right, and there are already limitations you have to keep in mind when interfacing with other APIs. Some common formats (like JSON) already have a tough time with some common values for a variety of reasons.

But 128-bit integers have little-to-no support in many common formats, so it's not like most Encoder/Decoder implementations will have anything more reasonable to reach for than "break the value down". (Yes, strings are an option, but likely to be slower.)

Put in other words: practically, JSON can support some values with >53 bits without loss of precision (so some Encoders can sort-of-hand-wave-it-away?); but it can support no values with >64 bits without loss of precision.

(I don't think that we should put limits on the encoding side, like trapping on >53-bit integers just in case; just to be realistic that many formats won't be able to handle 128-bit values in any other way.)

wadetregaskis · March 13, 2024, 1:46am

I'd rather lack of 128-bit integer support be addressed by the decoder than the encoder. If the wire format supports 128-bit integers (as e.g. JSON does) then use that. Because decoders vary across languages and individual implementations, you're never going to be able to satisfy every one of them & their limitations.

JSON supports arbitrary-precision numbers (both integer and floating point), which even Swift does not (intrinsically). So you can already get valid JSON that JSONDecoder can't handle. It could, but the mechanism for that is inherently not just language-specific but environment-specific (e.g. are you willing to depend on a 3rd party BigInt library?). So it really has to be up to the higher levels to figure out how to handle those things. Mangling the serialised format won't really help anything.

tera · March 13, 2024, 6:44am

I agree. If the hack of converting to an array of ints or to a string exists somewhere at all – it should be at the higher level, e.g. here:

JSONEncoder().nonConformingInt128EncodingStrategy
// .default .arrayFallback .stringFallback
// ditto for JSONDecoder()

BTW, with the current JSON "iffy" coders in the field, 128-bit ints represented as a pair of int64's would be also "iffy" encoded (if any of the two ints is greater than 53 bits).

markuswntr · March 13, 2024, 9:01am

+1 on the proposal, including @beccadax's Codable suggestion (an in-depth study of the proposal and implementation)

mgriebling · March 14, 2024, 1:23pm

This package requires swift-tools-version:6.0.
How would we update to this version so we can test this package?
I don't seem to find an xcode version with this tool version.