Serialization in Swift

Hello folks,

Common feedback during the review of SE–0295, and over the years since the introduction of Codable, is that Codable is not flexible or customizable enough to address all serialization needs.

The core team felt it was important to share that while Codable plays a critical role in the Swift ecosystem, the core team does not see Codable as the end-state of serialization in Swift. By design, Codable solves only a subset of the serialization needs, and Swift needs to gain additional capabilities to have a more complete serialization solution.

The core team would like to initiate this conversation with the community to gather requirements and discuss future designs and their trade-offs, ranging from improvements to Codable to additional tools and APIs. Our goal is to use the information gathered in this thread to inform future proposals that will bring a more complete serialization solution to Swift.

Thanks,
-- Tom

76 Likes

One issue I would like the new API to address is to allow more customization points around single properties. I can provide the following cases:

  1. When I want to specify date formatting, the existing Codable API only allows me to set dateEncodingStrategy on the JSONEncoder object. In most scenarios which I have encountered the date formatting logic should only be written once (in the data type definition I suppose), but the current Codable API requires every use site of the data type specify the rule (which could easily be got wrong or just forgotten in the copy&paste process). Much worse is that we can't specify two different dateEncodingStrategy on two different properties in a single data type (the data field may come from sub systems, and it should be easier to write this behavior on the client than to persuade server guys to change their logic).

  2. Sometimes a json field may not exist or just be null, or it may have a json object type that is different from the client's expectation. In some cases I'd rather have a single optional type to handle all the cases (like Optional<String>), instead of the whole decoding process fails. In the object type mismatching subcase, I prefer there is some mechanism which could allow this information sent back to the caller of decoder while the other fields could be decoded normally. When the server API got one minor field wrong in the production environment, an iOS engineer just doesn't want there is no ANY data in tableView/collectionView ;-)

  3. A way to specify nested fields more easily. Currently in the Codable API, we have to define separate data types to handle the nesting structure in a JSON object. Sometimes I want to flatten the structure on the client side (e.g. for better uniform representation, easier to do some abstraction), but it may be difficult (or impossible) to let server guys do the flattening logic for me. So I'd prefer that there will be rather easy way to specify a nested field (like @field("outerObj.innerObj.modifiedDate") var modifiedDate: Date?). This could save me a lot of time.

  4. Currently in the Codable API, when I want to have some customization around a single property, I have to rewrite ALL the auto-generated code in the encode/decode methods. This is rather cumbersome in cases that a data type may contain dozens of properties. My point is that customization around one property should not affect the existing logic for the other properties in the same data type.

The cases I mentioned above are ALL from the business code I encountered/wrote in the past several years, so I really hope the new API could address these issues.

22 Likes

This is a huge topic, could the Core Team narrow it down a bit? In what ways does the Core Team see Codable as deficient or less than complete? What solutions are on the table? What solutions aren't? Has the Core Team been tracking the feedback already given by the community over the last 4(?) years?

Edit: Perhaps most importantly, what resources are Apple / the Core Team willing to commit to implementing whatever comes out of this discussion?

14 Likes

For what it’s worth I really hope we can push out next iterations of existing APIs and/or new APIs out of the compiler.

The development, maintenance and complexity of anything done as one-off feature in the compiler are very high — which partially is why it’s so hard to change Codable to do anything more nifty I believe.

It would be unfortunate to put more highly-specpfic infrastructure into the compiler, rather than offer some more general meta-programming mechanisms, for various serialization libraries to utilize :heart:

39 Likes

Yes, Codable is in a tough place. Some improvements require compiler work which is hard for the community to do. Other improvements are blocked by those waiting for this higher level, idealized solution to remove Codable from the compiler to arrive. At this point, and to the topic of this thread, I don't think we should let it be a blocker any longer. It should be treated as a compiler refactor to do so, not a prerequisite for improvements to Codable.

1 Like

There are really two parts to this topic: improvements to Codable, both short and longer term, and likely a new family of protocols for low level, extremely high performance encoding (not a replacement for desperately needed improvements to Codable performance).

Codable Improvements

These are high level points, as my experience with Codable's internals is limited. I'll let others speak to the specifics of many of these points, but these pain points are felt by all at some point.

Codable's main issue is that it was implemented and shipped once and then never evolved. I won't speculate why that is (:grimacing:) but it's primary failing is that it just isn't enough. So it needs more. More what? More everything!

  • More control: users need more control over encoding and decoding. This includes moving strategies into the protocols, easy per-key customization, easier customization of keys themselves, and the enabling of more powerful wrappers.
  • More performance: Codable is exceedingly slow, putting Swift at the bottom of various serialization benchmarks and seriously impacting the ability to parse any significant amount of data. We need a focus on both the absolute performance of the underlying Codable APIs, including what they require from custom coders, as well as huge updates to JSONDecoder to move away from Foundation. Some sort of lazy API would be beneficial as well.
  • More APIs: Implementing coders needs to be both easier and more open to optimization. Similar to how result builders turned out, implementing a custom coder should start simple and get only as complicated as needed by the format being used, or for performance. AnyCodable or another solution for heterogenous collections would be nice. And please, add the TopLevel protocols!

Bufferable or low level serialization for Swift

It seems likely that whatever improvements that can be made to Codable won't be enough for truly high performance, large scale serialization necessary for server applications. Its type-based API likely just won't allow it. Instead, some alternate set of APIs could be designed to operate directly on some buffer type, allowing for both higher performances and direct streaming into memory. However, this approach would first require a native Buffer or similar type in the standard library. I imagine the requirements for encoders and decoders will be somewhat more complex than (a hopefully improved) Codable, with matching complexity on the user side.

23 Likes

Some features that would be handy:

11 Likes

There is a interesting pitch from S4TF team Automatic Requirement Satisfaction in plain Swift.

2 Likes

My thoughts largely line up with several of @Jon_Shier’s.

The biggest pain-point of Codable for me is that it is just too damn slow. This slowness is in large part because it’s entirely unoptimisable by the compiler: there are too many jumps through unspecializable generic code. This forces Codable to be implemented entirely in the form of runtime type checking, which for the vast majority of types will fail, burning time and hitting the quite slow runtime type check machinery of Swift. There are some other costs, such as the need to manage lookaside stack datastructures to provide coding paths, causing Codable to need to heap allocate a bunch as well.

Related to this is that it is borderline impossible to implement an Encoder/Decoder without simply cargo-culting from an existing one. The codable APIs are entirely opaque and vague, there is huge API surface area which is inconsistent with itself, and the only way to know what methods will be called in what order is to simply try it and see. Many Codable implementations rely on implicit behaviours not directly codified by the protocol (e.g. that encode(to:) implementations will not stuff a reference to the Encoder somewhere and call it after the function exits). As an example of this frustration, the swift-http-structured-headers Codable implementation consumes more than twice as many lines of code as the actual serializer/deserializer, almost all of which is opaque boilerplate.

But I’d like to add an extra point: Codable is simultaneously too general and not general enough.

It has been said before that Codable’s design seems to perfectly match what you need to do JSON and not much else. This implies a certain lack of generality and makes Codable a difficult fit for other use-cases. For example, it is very hard to do a satisfying ASN.1 implementation using Codable, because ASN.1 requires more information than Codable will provide. This forces implementations to provide more context, such as giving the Encoder the ASN.1 type definition document before serializing. If you’re going to hardcode that string anyway, it may be better to simply do code generation (notwithstanding that Swift doesn’t make that easy).

But at the same time, Codable is far more complex than you actually need for JSON. It was clearly intended to be general, and it pays hefty performance and complexity costs for that generality, but fails to actually reap any reward for it. Something purpose-targetted at JSON would have produced a substantially better solution.

48 Likes

I'd like to echo everything Cory said here.

I'd also like to add that earlier this week I tired to make an Encoder/Decoder for a pet project and found myself hopelessly confused on how exactly to implement them. Regardless of what exactly comes next, I'd like to see additional focus on documentation for writing the coders as opposed to just consuming them.

12 Likes

A (partial) list from writing a CBOR library, including Codable support.

For the general CBOR library, there isn't anything in the Swift standard library or in Foundation for helping with efficient binary data manipulation. I wound up making Swift-NIO a dependency for ByteBuffer, vs creating my own new implementation on top of raw data buffers or Data.

  1. In general, I would prefer the interfaces and implementation requirements for Encoder/Decoder were optimized for compiler generation and consistent developer manual authoring.
  2. There is no API usage restriction that makes keyed Codable values have a consistent order between encoding and decoding.
  3. Requiring keyed encoding to be in a particular order also enables use cases where the data is potentially serialized as unkeyed data as an option. For this reason, the API should also mandate that keys are not omitted from encoding based on the value being omitted based on .e.g it being the default or nil.
  4. Better support for special-casing output for defaulted values, such as omitting JSON keys which are assigned nil when that is considered the implicit value.
  5. Based on a desire for a consistent ordering of keyed Codable values between encoding and decoding, I would like to see KeyedEncodingContainerProtocol and KeyedDecodingContainerProtocol act like a forward cursor, similar to the Unkeyed variants.
  6. A sign of an improved API would remove the requirement for a Codable implementation to use class types. Note that decoding will still need some intermediate representation if the serialization format allows for keys to be in arbitrary order (like JSON)
  7. I have doubts that the Single Value Encoding/Decoding Containers get enough benefit from optimization by having the individual decode/encode methods for many of the Swift standard types, vs defining that a generic decode() / encode() method has mandatory requirement to support for certain base types. I wind up needing to defensively switch in the generic decode() method (as an example) anyway, since someone might pass a UInt64 in on that interface.
  8. Guidelines for versioning object serialization, which may include types for semantic version identifiers and arbitrary version identifiers
  9. Add the ability override the encode/decode implementation for a particular type, rather than having this restricted to certain built-in types like Date and Data.
  10. The ability to alter the generated Codable implementation on a property-by-property basis. This today is sometimes done via Property Wrappers, which has both side-effects and limitations (such as an inability to eliminate default values from serialized form).

Decoding-specific

  1. I would like to see a lot of duplicate code elimination by making KeyedDecodingContainerProtocol and UnkeyedDecodingContainerProtocol leverage SingleValueDecodingContainer.

  2. While other requirements may make it infeasible (including my own!), I would imagine UnkeyedDecodingContainerProtocol being a Sequence of SingleValueDecodingContainer, while KeyedDecodingContainerProtocol is a Sequence of (CodingKey, SingleValueDecodingContainer) tuples.

  3. I think there are existing issues documented for Decodable based on limitations of use of an initializer. If the emphasis is on optimization and compiler generated code, it may be appropriate to create an intermediate typed Builder here that an initializer would accept. This could allow decoders to be built on a push-style model.

For the Future/Would be Nice:

Codable today really only supports tree structures, while DAGs will create duplication (which may potentially be duplicated classes) and a cyclic graph potentially resulting in a halt. The ability to support these formats (potentially using Identifiable to serialize references) would be nice.

This obviously creates an issue with the current system where encoding/decoding have to be done as a single operation.

9 Likes

Thanks for starting this thread @tomerd! This is a really important topic and one I have spent a lot of time on. I don't have time to write up a detailed post but will do the best I can to summarize my views.

10,000 foot view

Codable appears to be an over-abstraction that tries to be too many things at once. There are at least three priorities for optimization available:

  • payload size
  • encode / decode performance
  • human readability (with different interpretations of this for different payload formats)

I think it is unlikely that one approach can be ideal for all of these priorities.

My experience

I have worked primarily in mobile apps and application-supporting libraries. The vast majority of the data I have had to encode / decode has been small to moderate sized json payloads. While we all want performance, Codable has been sufficient for our needs in this area.

The major pain point with Codable in my experience has been its imperative approach. This leads to lots of boilerplate code. Worse, when you need to round-trip a type you double the amount of boilerplate. Worst of all, it isn't hard to write an encode / decode pair that don't successfully round trip.

My solution

My team works with a very large number of cloud APIs created by several different teams over several years. These APIs use a wide range of conventions for encoding data into json. Our team needed a solution that was better than manually writing and maintaining a ton of encode / decode boilerplate.

In order to solve this problem I created a substantial Sourcery template. This template includes annotations that allow a declarative approach to specifying a mapping between a model type and an encoding. The same annotations are used to generate both encode and decode logic, guaranteeing a successful round trip.

A significant benefit of this approach is that it raised the level of discussion in the team. Instead of talking about imperative code, we are able to talk about behavior using the vocabulary introduced by the template. Further, we are able look at the declaration of a type and quickly understand at a high level the encoding / decoding behavior it exhibits, in terms of a shared vocabulary introduced by the annotations.

While our template meets the vast majority of our needs, we still encounter use cases that really are one-off and not worth generalizing. We support these through escape hatches that allow the user to write the necessary ad-hoc code without having to also write all the boilerplate to encode / decode the rest of the type.

Declarative coding for Swift

At a high level, I strongly believe Swift should introduce a declarative API for describing encoding / decoding behavior. I am not that familiar with Rust's Serde but from what I do know it looks like a very good point of reference.

Some important features my team has required:

  • Keys: the property name of a model often does not align with the cloud json key
  • Key auto-mapping: in some cases, the name simply needs to be uppercased or snake-cased
  • Not coded: sometimes a property should be omitted when encoding
  • Optional: we have seen cases where it was necessary to require the presence of explicit null
  • Defaults: in some cases we need to convert null / missing values to a default during decoding (especially when decoding a container)
  • Unrecognized values: in some cases (usually simple enums) it has been necessary to convert unrecognized / invalid values to nil, optionally with an assertion failure (debug only).
  • Collections: with some APIs we have had to filter out explicit null values and / or unrecognized values
  • Dates: we support a range of date encoding formats, including one that is idiosyncratic to our domain
  • Nesting: in some cases our model flattens a layer of nesting and in other cases it introduces a new layer of nesting (i.e. taking a subset of properties and replacing them with a struct and a single property)
  • Single value wrappers: for example, wrapping a string in an EmailAddress type in a way that is transparent to the encoding
  • Encoding empty collections and nil: it may be necessary to include or omit these values

This is not intended to be a comprehensive list of everything we should have in a robust solution for Swift. It is only a single case study derived from a large scale, real-world context. I hope it provides a useful list of capabilities to consider.

Note: The declarative approach is not mutually exclusive with a lower level, more performance-oriented approach. In fact, it could be implemented on top of a new low level coding library that addresses some of the performance concerns with Codable mentioned upthread.

Enums with associated values

Enums with associated values require special attention. The languages used to implement cloud APIs often do not include a corresponding feature. Despite that, domain models often do include sum types. These end up getting encoded using a wide range of strategies. One example worth mentioning that we have seen is a strategy that requires us to "infer" a default case when it is not specified in the json.

As a result, our Sourcery template supports a wide range of formats. Annotations are available on the enum type, individual cases, and even individual associated values. This is one of the most valuable parts of our template. It allows us to model our domain as it is naturally represented in Swift and provide a simple declarative mapping to the necessary encoding. We do not need to think through subtle, low level, imperative details about how to translate the Swift representation to the specified encoding format over and over for each enum in our domain.

Conclusion

A declarative solution to coding is far more convenient and pleasant to use than an imperative solution. It has the potential to provide the same kind of leap forward for coding that SwiftUI did for UI. I sincerely hope you will consider work in this direction.

34 Likes

Some actual examples of what you mean by declarative vs. imperative code would be helpful here. It mainly sounds like you want lots of attributes (via wrappers probably), or do you mean something else?

2 Likes

I was deliberately vague about this because the specific details are less important than the paradigm shift. By declarative I mean: specifying a mapping between a type and a serialization of that type without writing imperative code. You specify that what, not the how and you only do it once for a type instead of specifying encode and decode separately.

The most obvious way to accomplish this would be a family of user-defined attributes (i.e. types conforming to magic protocols) that may be applied to a type, a property, a case, an associated value, etc. This would follow the approach used by Serde and be similar to the Sourcery annotations my team has used.

You mention wrappers: I do not think this should be implemented as property wrappers. They are not powerful enough to implement all of the capabilities necessary and they are intrusive to the runtime of the type. I don’t think the serialization system should have any impact on the runtime implementation of a type other than during encoding and decoding.

15 Likes

I’m excited that this can of worms is finally being opened, thanks @tomerd!

TLDR

Given our past performance, I find it unlikely that we will come up with a singular serialization solution which addresses the full scope of the community's needs. As such, it makes sense to shift focus to providing a lower level toolkit that can both be used to solve the prominent use cases (like serialization for common formats), as well as enable folks to address the long tail of weird things that have made consensus on this topic difficult to achieve. This kernel of functionality is static structural reflection.

My Experience

I've used Codable extensively over the years, primarily for mobile applications but also a nontrivial amount of command-line tools and server applications, I also authored ResilientDecoding. I've ran into a bunch of the pain points already discussed here (like performance concerns), but often it was beneficial to eat the cost of Codable in exchange for the utility it provided. Here are a couple of the weirder use cases I've come across (a bit simplified for this discussion):

Transformations on unstructured data At a large company I used to work for, we needed to interface with a backend API that contained a fair bit of legacy cruft as well as a number of unfortunate compromises from supporting a variety of different platforms. Using `Codable` directly with this API was a pain, and instead I wrote some code that applied some general transformations on the `NSJSONSerialization` output, then instantiated a `JSONDecoder` with the transformed object. How did I accomplish this? I copied the implementation of JSONDecoder into our project, and exposed an initializer that took the unstructured data as an `Any`. It wasn't pretty, but it made the downstream usage of `Codable` significantly more ergonomic.
Abusing `Codable` for non-serialization tasks In my current role, I defined a protocol which had a `Codable` associated type and a method which took an instance of that type and performed some meaningful work. The trick here was that I needed to create an instance of the associated type in a generic way, and the logic to do that required a unique, semantic name for each of the leaf values of that type. I achieved this by creating a custom decoder which, when it reached a leaf value, passed the coding path to an initializer for that value (which had to conform to a specific protocol).

Comments

Performance

At a high level, I think the bar for performance should be "on par with generated code", meaning that if we need to write code generators to achieve good performance for things like protobuf we will have failed (as users will be likely to eschew our system and just lean on code generators). It seems entirely possible to come up with a system where the only thing we would need to generate from protobuf is a bunch of (field, offset) pairs and have the serialization infrastructure and compiler synthesize the necessary serialization logic. This will also make it much easier to, in the future, be able to import things like protobuf definitions the same way we import C code.
Much of this has been discussed upthread, but a less obvious concern is binary size. There will be a natural tension between specializing the decoding logic per-type (increasing binary size and performance) and having more general logic (decreasing binary size and performance). The compiler already has to deal with this for generics, but the magnitude of the effect will likely be much bigger for serialization, and since the logic is synthesized, this effect is often better hidden from the developer.

Ergonomics

  • Composability: Much of the infrastructure surrounding Codable is not composable. A simple example is date format selection, which is currently implemented directly on JSON{De,En}coder. A better solution would allow that logic to be modularized and independent of serialization, as well as be applied to a subsection of a serialized response. The same observation applies to snake-case-keys and other "strategies". It would also be nice if the encoding and decoding side were semantically connected (as opposed to having DateDecodingStrategy and DateEncodingStrategy).
  • Customization: Right now we can achieve heavy-weight customization by manually implementing init(from:) or encode(to:). Property wrappers provide some lighter-weight customization but are fairly limited in what they can achieve. A big limitation is that we can't pass value arguments to the encoding/decoding infrastructure; in theory something like @Resilient(defaultValue: 5) var value: Int should be possible (You can currently write this, but there is no way to access the value 5 during decoding). Another avenue for adding customization is adding keypaths to decoding, which would enable something like protocol Resilient { static func defaultValue<T>(for keyPath: KeyPath<Self, T>) -> T }

Static Structural Reflection

At the top of this post, I mentioned "Static Structural Reflection" as the kernel of generic Codable functionality with which we can solve the big serialization use cases, as well as the long tail of interesting things the community might need. Automatic Requirement Satisfaction in plain Swift discusses this, though I share some concerns about its type-system-oriented approach (something closer to how function builders work might be preferable). Such a system would need to have the following capabilities:

  • Be able to express structural constraints on a type (for instance, all leaf properties must conform to the Decodable protocol). This could be implicit from the code doing the structural reflecting. For instance, if we ended up with a system similar to function builders, we could mandate that an overload exists for a buildExpression-style method.
  • Given a type, we should be able to access a collection of stored properties, along with their name, key path and value type; we should also be able to look this information up for a specific property (for instance, by name).
  • Use the information in the previous point to create an instance of that type from a set of values
  • As a bonus, such a system could replicate much of the functionality of Mirror with less runtime overhead and more static checks.
13 Likes

I may have missed it, but no one seems to have mentioned The Big One: Codable is utterly broken for reference types.

(Specifically, it fails catastrophically when reference types form an object graph that contains closed loops of references — which is pretty much always. It also fails to respect instance uniqueness.)

In a larger sense, it also has an awkward problem with data that contains circular chains of data dependencies, because Swift's init rules make it impossible to unarchive such data without sometimes (a) exposing private properties non-privately, or (b) wrapping non-optional properties in optionals.

10 Likes

I feel that if the goal is to address all serialization needs, then the successor of Codable ought to be something more like a separate library providng approaches of different granularity rather than a pair of protocols in stdlib. It's already visible from this thread that some users prefer simplicity, while others prefer customizability; I also haven't noticed the topic of migration mentioned yet. Trying to provide a one-for-all general solution might just be infeasible or overly complicated, so I think it would be wise to design a set of different protocols, akin to the collection protocol hierarchy, each serving on its own level.

In particular, I could imagine having something like:

  • SimpleCodable — for any types that can enjoy the current encoding/decoding behaviour without any additional customizability,
  • ReferenceCodable — a type that supplies additional logic that tells the encoder/decoder how to encode/decode reference cycles (and generally anything that doesn't comply to a tree-like structure),
  • MigratingCodable — a type that has some knowledge of its own version and can locally decide how to deal with them versions,
  • EnumCodable — a type (evidently, an enum) that specifies its own logic how to deal with associated values and labels on them. Having such a protocol would completely eliminate the concerns on the default format raised in SE-0295,

— the possibilities are endless, really.

Now, an issue with this approach I see is that this library still would need to have some additional compiler support to generate the default behaviour, which won't make it a truly independent library, but it might be worth it.

7 Likes

I'm writing a custom binary Coder implementation for high-performance XPC serialization (it's very rough and actively being worked on, but I put it online if anyone is curious: GitHub - saagarjha/Cod: A binary Swift Coder implementation), so this discussion is extremely relevant to me. Because of my use case, my focus will mostly be on writing custom Coders that perform well–but in generally I agree with the points brought up above about the need for better key transformations, handle versioning and decoding failures, and other ways to deal with the realities of parsing data that isn't ideal. (One thing do want to address is the "my decoder can't access my default value" complaint: personally, I feel this is a problem waiting for const generics, not better serialization. In my mind the ideal way to express these is

@Defaultable<5>
var foo: Int

which would allow a decoder to pull this value out at decoding time since it's part of the type itself. In the meantime I've been doing the janky workaround for a lack of const generics, which is to lift values into the type system with a bit of boilerplate:

struct Five {
	let value = 5
}

@Defaultable<Five>
var foo: Int

)

Anyways, on to actually implementing a coder: my opinion is that this is possible to do, but it is really lacking in documentation and gets progressively more challenging the further you get away from a JSON-type encoding. I think the design makes it technically possible to do pretty much anything, but at a certain point you're keeping around a god-object containing context for the entire encoding tree and just satisfying protocol requirements because you have to, not because it matches the design of your serializer. (Why would you continue to use Codable when you're clearly no longer on board with how it wants you to do serialization? Because it's the only way to get access to compiler-generated reflection you need for serialization. But that's going into "we need hygienic macros" and I don't want to digress too far.) @dwaite mentions that pretty much every implementation requires implementers to hand-roll a vtable in the generic encode<T: Encodable>(_: T) function; it is really disappointing that we have to do this because the compiler cannot help us write this particular switch statement. The extra functions should either be trimmed for not pulling their weight, or they should be rethought so that it is possible to statically dispatch to them automatically.

Performance wise: @lukasa is spot-on. Codable, as it is currently designed, cannot be performant. I can get actual numbers once I add some more optimizations in my implementation, but extrapolating from my measurements the performance ceiling is capped at a couple hundred instructions per byte, which is orders of magnitude slower than fast serialization implementations. Heap allocations are basically required to happen everywhere by design, and there is way too much "are you an x" going on that requires walking through really slow runtime type checks.

Unkeyed containers are critical for performance (at least, they are for me, since they back the big arrays and data) but they really are not designed for that at all. A good binary encoding can encode an array efficiently both in space and in time–copying out bytes (maybe with a quick application of variable length encoding or endianness fix) should be really optimized. With UnkeyedEncodingContainer it just can't be: you get a bunch of possibly-heterogeneous data piecemeal rather than all at once. Even if you specialize your format by detecting this common case (which is not ergonomic at all, mind you) the "here is one element → check the type of this element → store the data and its type information in some internal context so you can write it out later" cycle is pretty much the opposite of efficient. And if you happen to get strings or a structure that has variable size (has an array member, etc.) every element is going to be variable sized, so you have to throw away your optimizations halfway through and generate an index table of byte offsets to handle this.

Likewise, unkeyed containers have some fat that could be trimmed as well. The type system generally has an idea of the shape of data; it's not going to be missing random fields or have nulls in inconvenient places (the Decoder might, but coming from Swift everything should be well-formed). But the Encoder interface has no way of representing "this is a nice, complete type that I can hand to you together", which means that an implementation needs to keep track of this itself (and again, piecemeal as members arrive). Like, consider this struct:

struct Foo: Codable {
	let bar: Int
	let baz: Double
}

A KeyedEncodingContainer needs to have logic in its encoding method to keep a list of all the types it encounters during encoding, and for this one it will go "ok in my encoding functions I noticed that I was passed a Int and Double, I've seen that combination before (but I don't actually know what it is) so I don't need to write out a record for it". The other inefficiency is that the autogenerated CodingKey are Strings, not integers, and these can be a substantial overhead in the size the data you produce since you need to store at least some data describing the keys (it is unclear if keyed containers can rely on ordering being consistent?) I'm probably going to migrate to some sort of compressed prefix trie thing, but having access to integer keys for performance would have been significantly nicer.

Another strange quirk is that Codable is clearly meant to be fairly opaque to end users, who are meant to just use the APIs without looking under the hood at how it works, but if you do that you can accidentally leave significant performance on the table. For example, encoding a [[Int]] usually requires ever single byte to be looked at twice, because the array is going to encode each member individually (arrays go through the unkeyed coder), then combine them together and encode the whole thing another time (again, each byte is going through the unkeyed coder). You'd think the encoder would be "smart" and just see "oh, this is nested, I can encode it and pass it through without looking at it again" but most coders will not specialize this and it's not clear whether they even should (especially for types they don't own). So you can hit random performance cliffs if you aren't an expert in how the implementation actually works, but pretty much nobody is.

Anyways, this is getting a bit long and rambly for an initial post, so I'll stop here, but hopefully this this helps provide more details about what parts of the API seem to be slow or difficult to work with from my side.

9 Likes

I'v seen this one (and similar) solved with property wrappers. what do folks think about that solution?

1 Like

something like https://forums.swift.org/t/add-support-for-encoding-and-decoding-nested-json-keys/?

1 Like
Terms of Service

Privacy Policy

Cookie Policy