Hello Swift Community,
I’m happy to announce that I've been hard at work planning a potential future for serialization & deserialization APIs in Swift. It's clear from community adoption and feedback that Codable
has had a lot of success in the years since it was added to Swift 4, but that it doesn’t satisfy some important needs. One of the foremost of those needs is performance more in line with programming environments that compete with Swift. As such, the main goal for this effort is to unlock higher levels of performance during both serialization and deserialization without sacrificing the ease of use that Codable
provides.
This is a large project with a lot of moving parts and I'm not quite ready for an official Pitch yet. However, I want to collect the community’s initial thoughts and reactions to the direction we’re taking so far to make sure that we’re successfully addressing the right set of problems.
Here are the core tenets of the effort:
A new API is required to escape the implicit Codable
performance ceiling
Even with all of its strengths, the existing API’s design has some unavoidable performance penalties. For instance, its use of existentials implies additional runtime and memory costs as existential values are boxed, unboxed, retained, released, and dynamic dispatch is performed.
Also, because a client can decode dictionary values in arbitrary orders, a KeyedDecodingContainer
is effectively required to proactively parse the payload into some kind of intermediate representation, necessitating allocations for internal temporary dictionaries, and String
values. In fact, during JSONDecoder
optimization work, I discovered that ALL containers need to do this because some Decodable
types retain the decoder or one of its containers after init(from:Decoder)
returns to perform deferred decoding—which was not an intended usage of the interface.
Dynamic casting is prevalent in both Property List and JSON encoders and decoders. They have special support for Foundation types like Data
, Date
, and others that are not members of the core set of types Codable
supports at the standard library level. These dynamic casts are unavoidable with the existing design and have a measurable impact on performance.
Because of these and other performance penalties that are inherent to the existing APIs, the best path forward is to design a new API that avoids them as much as possible.
Rust’s Serde feature is an attractive design to imitate for performance
The design of Rust’s Serde neatly avoids many of the issues described above, which helps it achieve its impressive levels of performance.
For instance, nowhere in its design does it use dyn
trait types, which are the closest parallel to Swift’s any
existentials. Most notably, Serde’s deserialization design employs a Visitor pattern which allows the parser to drive the deserialization process instead of being required to service requests from the client in arbitrary orders. This design can be more easily optimized for performance. For instance, when deserializing the contents of an encoded dictionary, the deserializer vends keys and values to the client in the order they occur in the payload. This allows the deserializer to elide building an intermediate representation of the dictionary in temporary memory. These are features that we can imitate in a Swift-friendly design quite easily.
Serde’s visitor pattern and deep use of lifetime annotations also enables incredible opportunities for borrowing data from the payload instead of copying. While Swift’s ~Escapable
and lifetime dependency features are still in their infancy, there are still some opportunities to use them for significant performance boosts, like pattern matching dictionary keys. I hope that in due time we’ll be able to leverage those features in this design to do more borrowing and less copying like Rust allows.
Deep macro reliance will limit the need for manual implementations
Rust Serde’s deserialization Visitor pattern, mentioned above, requires more verbose boilerplate code than Swift’s Decodable
. For instance, consider a simple Swift Decodable
type's init(from:)
implementation that is a total of three lines:
struct Person: Decodable {
let name: String
let age: Int
init(from decoder: any Decoder) throws {
let container: KeyedDecodingContainer<CodingKeys> = try decoder.container(keyedBy: CodingKeys.self)
self.name = try container.decode(String.self, forKey: .name)
self.age = try container.decode(Int.self, forKey: .age)
}
}
Contrast that with a minimal version of the same type that uses a hypothetical Visitor pattern which grows to 12 lines (asymptotically a factor of 3x as many lines):
func visit(_ decoder: inout some StructDecoder) throws -> Person {
var name_tmp: String?
var age_tmp: Int?
while let key = decoder.nextKey() {
switch key {
case "name": name_tmp = decoder.decodeValue(String.self)
case "age": age_tmp = decoder.decodeValue(Int.self)
default: break
}
}
guard let name = name_tmp else { /* throw an error */ }
guard let age = age_tmp else { /* throw an error */ }
return Person(name: name, age: age)
}
Serde provides Serialize
and Deserialize
derive
macros to generate this (de)serialization code directly from custom type definitions. This isn’t really a new concept to those familiar with Swift's automatic synthesis of Codable
types.
The advantage of Serde’s macro design is that it enables much more comprehensive customizations than Codable
synthesis does, allowing it to be leveraged more often and requiring fewer custom implementations.
In Swift, when a client needs to do more than just alter the default CodingKey
representations, developers are often faced with a large cliff where they’re forced to manually replicate the whole Codable
implementation just to do so. This situation is somewhat ameliorated by property wrappers, but the kind of customizations those can achieve is very limited. Xcode is now capable of emitting the synthesized Codable
implementation directly into source files which eliminates the initial manual implementation work, but it must still be manually maintained as code evolves.
In this new design I aim to leverage Swift’s macro features to meet or exceed Serde’s level of support for customization of synthesized conformances. Moving code synthesis from the compiler to a macro will enable us to use attribute-like macros as targeted customization mechanisms, which was not something we could easily accomplish with the compiler-based Codable
synthesis. The compiler implementation is also frankly difficult to evolve. A macro, by its very nature, will be much easier to enhance as needed over time.
Dual approach: format-agnostic and format-specialized protocols
One of the advantages of both Serde's and Codable
's design is some level of format-agnosticism. In other words, you can implement Serialize
or Encodable
for a type and expect to get a valid encoding regardless of what Serializer
/ Encoder
you use.
The need for format-agnostic encoding interfaces is incontrovertible, but it does present some problems. One these problems that Serde doesn’t solve is support for types that don’t fit neatly into its data model, which consists of strings, numbers, byte sequences, arrays, dictionaries, etc. Some encoded types can’t cleanly translate to these types at all, or they may have native and more optimal representations in a given serialization format.
One example that is particularly close to home is date values in Property Lists. Property List has native support for date—either as a <date>
tag in XML plist, or a dedicated type specifier in binary plist. This is how Foundation.Date
values are expected to be encoded in property lists. By contrast, Rust Date
-like types typically encode themselves in floating point or string representations, as those are the most appropriate types available in the Serde data model. It is technically possible to convince a Property List Serde serializer to encode a native date value, but it requires coupling the date type’s implementation to that of the serializer’s, which is antagonistic to Serde’s intention of Serialize
trait types being format-agnostic.
Actually, a similar problem exists with Codable
. There is no encode(_: Date)
function present in the Encoder
interface, which means PropertyListEncoder
has to attempt to dynamically cast every some Encodable
type it receives to Date
in order to handle these natively. This helps keep the Encodable
type format-agnostic, but it has a negative impact on performance, even if you never actually encode any Date
s.
I believe that fully and formally embracing format-specialization where appropriate is the best solution to this problem. Specifically, we should encourage each serialization format that has native support for data types that aren't represented in the format-agnostic interface to produce its own protocol variant that includes explicit support for these types, e.g. JSONCodable
or PropertyListCodable
. These format-specialized protocols are expected to be entirely distinct from the format-agnostic one, but they should share the same basic structure and patterns. For maximum compatibility, encoders for a specific format are also expected to support types that only conform to the format-agnostic protocol.
This dual-protocol approach should simultaneously enable broad compatibility for simple types that have no knowledge of what format they’ll be encoded in, as well as high performance and ideal encoded representations when a client knows they are encoding their types into a specific serialization format.
Unfortunately, each format and specialized protocol will need to provide its own main macro, which is a lot of work. However, in due time I hope to facilitate implementing these macros by virtue of a package that contains the core pieces of such an implementation that will simplify this work.
Compatibility with Codable
While I anticipate the new APIs to largely supplant Codable
, we can’t neglect its legacy by expecting everyone to move over to new system right away. To this end, we encourage all encoders and decoders to not only accept types conforming only to the new format-agnostic protocols, but also Encodable
and Decodable
types, if possible.
Implementing this support may seem daunting, but for self-describing serialization formats, I am developing generic Encoder
, Decoder
, and Container
types that operate on format-specific primitive values, e.g. JSONPrimitive
or PropertyListPrimitive
. Thus supporting Codable
becomes as simple as implementing functions that convert between these primitive values and Codable
’s data model.
Non-goals
- While we should support as many different serialization formats as possible—even more than original
Codable
, ideally—there will inevitably be some that don't fit this particular model. We should focus the design around the formats that are most common to the Swift ecosystem. - The end result for this design should keep all definitions in code. There are many popular serialization formats (e.g. Protobuf, Flatbuffers, etc.) that use auxiliary definition or schema files, external code generator tools, and third party libraries. Regardless, for this effort, the only tool that should be required to adopt is the swift compiler, and the only dependencies should be the standard library, and any library that defines a format-specialized protocol.
- This design does not include support for encoding and decoding cyclical objects graphs. Relatedly, there's still no intention to include encoding of runtime type information in serialization formats for any purpose—all concrete types must be specified by the client doing the encoding or decoding.
Example
To help roughly illustrate our vision for these APIs, here's a small example of a macro-annotated type, along with what it would expand to. (Please recognize any names and designs are placeholders, and that macro implementations are still theoretical at this stage.)
// Written code:
@JSONCodable
struct BlogPost {
let title: String
let subtitle: String?
@CodingKey("date_published") @CodingFormat(.iso8601)
let publishDate: Date
let body: String
@CodingDefault([])
let tags: [String]
}
// Synthesized code:
extension BlogPost {
enum CodingFields: Int {
case title
case subtitle
case publishDate
case body
case tags
case unknown
}
}
extension BlogPost.CodingFields: DecodingField {
static func field(for key: UTF8Span) throws -> Self {
switch key {
case "title": .title
case "subtitle": .subtitle
case "date_published": .publishDate
case "body": .body
case "tags": .tags
default: .unknown
}
}
}
extension BlogPost.CodingFields: EncodingField {
var key: String {
switch key {
case .title: "title"
case .subtitle: "subtitle"
case .publishDate: "date_published"
case .body: "body"
case .tags: "tags"
case .unknown: fatalError("Cannot encode unknown field")
}
}
}
extension BlogPost: JSONEncodable {
func encode(to encoder: inout JSONEncoder2) throws -> Person {
try encoder.encodeStructFields() { fieldEncoder in
try fieldEncoder.encode(field: CodingFields.title, value: self.title)
try fieldEncoder.encode(field: CodingFields.subtitle, value: self.subtitle)
let publishDateValue = self.publishDate.formatted(.iso8601)
try fieldEncoder.encode(field: CodingFields.publishDate, value: publishDateValue)
try fieldEncoder.encode(field: CodingFields.body, value: self.body)
let tagsValue = self.tags ?? []
try fieldEncoder.encode(field: CodingFields.tags, value: tagsValue)
}
}
}
extension BlogPost: JSONDecodable {
static func decode(from decoder: inout JSONDecoder2) throws -> Person {
try decoder.decodeWithStructHint(visitor: Visitor())
}
struct Visitor: JSONDecodingStructVisitor {
typealias DecodedValue = BlogPost
func visit(decoder: inout JSONDecoder2.StructDecoder) throws -> BlogPost {
var title_tmp: String?
var subtitle_tmp: String?
var publishDate_tmp: Date?
var body_tmp: String?
var tags_tmp: [String]?
while let field = try decoder.nextField(CodingFields.self) {
switch field {
case .title: title_tmp = try decoder.decodeValue(String.self)
case .subtitle: subtitle_tmp = try decoder.decodeValue(String.self)
case .publishDate:
let formatted = try decoder.decodeValue(String.self)
publishDate_tmp = try Date.ISO8601FormatStyle().parse(formatted)
case .body: body_tmp = try decoder.decodeValue(String.self)
case .tags: tags_tmp = try decoder.decodeValue([String].self)
case .unknown: try decoder.skipValue()
}
guard let title = title_tmp else { throw <missing required field error> }
let subtitle = subtitle_tmp
guard let publishDate = publishDate_tmp else { throw <missing required field error> }
guard let body = body_tmp else { throw <missing required field error> }
let tags = tags_tmp ?? []
return BlogPost(title, subtitle, publishDate, body, tags)
}
}
}
Conclusion
Again, I look forward to incorporating feedback of the broader Swift community to help in designing a highly performant serialization system that is easy to use and customize and meets the feature needs of more developers than ever.
The next concrete steps will include an official Pitch for the stdlib APIs and swift-foundation evolution proposals for JSON and PropertyList protocols & encoders/decoders. Proposals for macro definitions will follow later.