The future of serialization & deserialization APIs

Yep! I think this can be covered by the new macros—and with greater flexibility than what property wrappers can provide.

I expect this specific case of "present/valid value maps to .some, null and absent both map to .none" should be more naturally handled, even with the use of a non-standard decoding scheme (in this case, converting strings to doubles). For this case, you'd likely specify a non-standard visitor type for this property that has a DecodedValue type of Double?. If it visits a string or a null, it produces the appropriate value, but if it visits no value, the Double? temporary variable for this property will remain nil and get initialized in the final type as such.

1 Like

Re: The end result for this design should keep all definitions in code .

Specifically excluding formats such as protobuf means specifically making writing systems that interoperate using those formats. I think forcing Swift serialization/deserialization to be a walled garden is unfortunate.

I like this approach generally. It would be good to consider other decoding more formats like Protobuf, Thrift, etc. These are increasing common responses from APIs and it would be helpful to have a universal way of parsing these formats.

1 Like

I fully disagree with @BenjaminBriggs and @feldur. FlatBuffer, Protobuf, Thrift, Cap'n proto, etc. can't really be addressed by this library feature [1]. This wouldn't work on many levels.

Schema DSLs

Schemas aren't defined in Swift. They aren't defined in C. They aren't defined in Java, JavaScript, Haskell, Python, etc.

All aforementioned formats have designed their own schema language as a means of sharing that information between different ecosystems. Disparate ecosystems don't usually have a way of sharing a single library that defines serialization formats, so instead they share a .proto file which will generate said library for every and any language you would want to use.

This is an incredible thing to have. A service written in Go can use the same wire data model as the one written in JavaScript.

We aren't getting rid of external schema files anytime soon.

I mean, you don't necessarily have to use a schema file

You of course can "port" schema file to swift.

import Thrift

@ThriftCodable
struct Vec3 {
  @ThriftTag(1) var x: Int32
  @ThriftTag(2) var y: Int32
  @ThriftTag(3) var z: Int32
}

@ThriftCodable
enum Color: Int32 {
  case Red = 1
  case Blue
  case Green
}

@ThriftCodable
struct Monster {
  @ThriftTag(1) var pos: Vec3?
  @ThriftTag(2) var mana: Int16 = 150
  @ThriftTag(3) var hp: Int16 = 100
  @ThriftTag(4) var name: String?
  @available(*, deprecated)
  @ThriftTag(5) var friendly = false
  @ThriftTag(6) var inventory: [UInt8]?
  @ThriftTag(7) var color: Color = .Blue
}

It is useful for cases where only swift systems would need to use this serialization format. Or where you are fine with hand-porting a small one-off definition, just to avoid adopting the whole code generation tooling.

As I would discuss later though, this is does not fit well with, nor does it require this proposal. One can implement such a package today, and consume it with SwiftPM with no trouble.

There's also JSON/XML schemas

Same argument can be applied to standardized JSON schemas. You can use them in the same way to share definitions between languages. You'd also would then inherit all the complexity of code generation :(

It's usually less prevalent, since you can use JSON without a schema, but you cannot(-ish) do the same with say Cap'n proto. Also JSON is human readable and schemas are often inferred by looking at the output, which is a useful property.

This is what Swift OpenAPI generator basically does.

So do we want to have an ability to

import "./definitions.proto"

?

Reliance on third-party tooling

But like I said, .proto isn't .swift. Do we now want to teach the swift compiler how to parse them? Does the Foundation need to provide a generic ProtobufDecoder? What about Thrift? FlatBuffers?

These are a really big projects. It is an enormous burden to integrate or port them into the compiler infrastructure. It's a non-starter.

What we can do is integrate existing third-party code-generators into the macro system.. maybe..

Pipe dreams... I mean, macros. Yes, macros!

One could imagine installing a swift-flatbuffers package, add something like this:

import FlatBuffers

#Definition("""
    namespace MyGame;

    attribute "priority";

    enum Color : byte { Red = 1, Green, Blue }

    struct Vec3 {
      x:float;
      y:float;
      z:float;
    }

    table Monster {
      pos:Vec3;
      mana:short = 150;
      hp:short = 100;
      name:string;
      friendly:bool = false (deprecated, priority: 1);
      inventory:[ubyte];
      color:Color = Blue;
    }
    root_type Monster;
""")

let monster = Monster(
  pos: Vec3(x: 0, y: 420, z: 69),
  name: "Cthulhu",
  inventory: [],
  color: .Red
)
serialize(monster)

And that is possible. It could simplify things. There's no need to deal with flatc, no autogenerated // PLESE DON'T EDIT files to manage. There are downsides of course.

Ideally we would've standardized something like #embed a long time ago to allow for a much better:

import FlatBuffers

#Definition(#embed("./monster.fbs"))

(God I wish #embed/#fileLiteral/#whatever was a thing [1][2])

Interoperability with format-agnostic types

Thrift and Protobuf require explicit tags. For a good reason.

How would that play out when serializing a format-agnostic type, like CGRect for e.g. There would be no benefit for those formats to support the proposed (codenamed CommonCodable) stdlib mechanism.

All in all they are a vastly different beasts [2]. Even in-between themselves.


I don't think we can do anything to ease adoption of these serialization formats. As you may see, no other language out there attempts to do so either. It is a good thing that this was stated as an explicit non-goal, and it makes sense.

What we can do, is provide a common interchange data model for commonly used self-describing formats, like JSON, CBOR, plist, msgpack, YAML, TOML, java properties, ini, XML (not sure about that one tbh), SQL(ite) rows, etc.

We can provide common format decoders (JSON, plist) built-into swift's distribution (ie. Foundation).

We can provide a way to integrate commonly used serialization data-model for use with less prominent data formats, and let users use them via separate packages.

Codable did that for the most part. We can do so better!

These are worthy goals to achieve, even if we can't fit Protobuf into it. I don't think it can be. I'd really like to be proven wrong though.


  1. Even though I agree that easing adoption of those formats would be a really nice thing to have ↩︎

  2. I forgot to mention that forcing deserialization to the user-defined swift structs would nullify one of the main reasons to use something like Cap'n proto or FlatBuffers, as they are "zero-copy" formats ↩︎

4 Likes

@Malien thank you, I think you're spot on with the intention of this proposal in terms of its goals, non-goals, and how open it actually is.

To summarize the main points:

  • .proto / schema files aren't going anywhere for formats or schemes that rely on them. They're vital for cross-system integration. Their usage is just outside the scope of this proposal.
  • Swift <-> Swift communication via .proto-less protobuf is a possibility, for a more narrow use case. It would not be terrible for an independent package designed for this purpose to mimic some of the patterns we're establishing here for easier knowledge transfer.
  • Agreed that a Swift protobuf implementation supporting CommonCodable , format-agnostic types is questionable. However, I'm trying to ensure that the CommonCodable design, if possible without significant performance penalty doesn't entirely prevent it. For a type like CGRect, it's not unreasonable to map it to a 4 field protobuf message with default field numbers based on the order of the properties in the struct. This layout is effectively frozen and will remain consistent across all releases. This would ONLY ever be useful in a Swift <-> Swift communication channel though, where the same Swift type is encoded and decoded on both ends.
  • I think there actually some some precedent for generic, in-language structs being used for Protobuf serialization, in the form of the kotlinx.serialization.protobuf package. I cannot claim to have any experience with this package of kotlinx.serialization in general, but it is something I have researched somewhat in the development of this design. This is the kind of usage I'd like to allow to work in CommonCodable without necessarily committing to doing it that work or having it live in the stdlib or Foundation etc.

I want to emphasize that actual products that I expect to come from this work are the following:

  1. "CommonCodable" and friends — an API in the stdlib for format-agnostic serialization structure description that is compatible with the more efficient visitor model.
  2. A common (though not comprehensive for every possible format) vocabulary of macro attributes for use by @CommonCodable and @XYZCodable macros to alter serialization in common ways.
  3. At a minimum provide visitor-based JSONCodable and PropertyListCodable protocols and macros… somewhere TBD.
  4. Establish common designs and patterns in macro usage and API structure to ease knowledge transfer between format-specialized protocols and to inform the design of separate packages specializing in other formats.
  5. Provide tools to facilitate creation of format-specialized macros to minimize individual rework of common patterns, especially in respecting the common macro vocabulary.

(I recognize that putting this list in the OP would have been extremely valuable.)

I expect this particular discussion will focus mostly around (1) and (2). (4) will come in as we discuss the actual Pitch for CommonCodable. (3) will be its own separate set of pitches/evolution proposals. The time and way (5) is discussed and handled is TBD.

3 Likes

For this point in particular, you could get pretty close with a macro that generates the type from a proto definition, similar to the original macro example that generated a type from JSON. With the right flexibility, that can come from a library. At least that way users can get part of the way to proper integration before a full external generator.

#proto("""
message Person {
  string name = 1;
  int32 id = 2;
  string email = 3;
}
""")

// Generates
struct Person: SomeProtocols {
  var name: String
  var id: Int
  var email: String
}

Yep, absolutely. I won't pretend to know what benefit this provides design that existing protobuf code generation methods and libraries do not, but the scope of this particular effort should at minimum include trying not to get in the way of allowing something like the above to happen, and at best allow integration with the common macro vocabulary and CommonCodable if it makes sense.

This is sort of what I did with ASN1Codable, you could completely define ASN.1 types in Swift using property wrappers. We had a separate ASN.1 to Swift compiler but this was not necessary for Swift <-> Swift communication.

On an unrelated note, we're using Codable to implement OCP.1. The specifics aren't that interesting except that we've noted that the Swift runtime cost was a major performance bottleneck using this for transmitting audio metering information (at the relatively slow rate of 10 messages/second). We've had to special case these paths to use hand-rolled encoders to keep the CPU usage and thermals under control. A replacement for Codable with less runtime overhead will be most welcome.

2 Likes

@lukeh Thanks so much for these concrete examples.

Are there any concrete clients that you're aware of that are eschewing use of the ASN.1 -> Swift compiler in favor of defining the type solely in Swift (presumably for Swift <-> Swift communication)? If so, are there any particular motivations for those clients using ASN.1 over any other format?

1 Like

As with most of my open source projects these days, I am the only client. I am using it with the Heimdal ASN.1 compiler to parse PKIX certificates in my application.

The initial motivation was ”vendor-related” but Apple ended up developing their own implementation here (and, given the performance issues we've noticed with Codable, I can't blame them).

The biggest annoyance with Codable for me is that it is incompatible with swift concurrency.

Hopefully this new approach will address that.

In how far is it "incompatible"? What exactly does that mean? You want it to happen in an async context?

see discussion here

But here is a trivial fail

Ok, I see the problem now.

If I am not mistaken the proposed approach results into O(n) code size complexity (n is the type size, e.g. a number of variables in a struct). I wonder if a better approach is possible that would be O(1) code size complexity while addressing the other limitations of Codable.

3 Likes

Let's take JSON as an example. At the most compact, there would at least be needed a table mapping string keys to field offsets in a struct, which is O(n) code size as well. (tbf, this type metadata table is already present in the binary)

One could use RTTI to not have any additional code generated per field. This is in an opposition to the original Codable, and this proposal, and would make it highly undesirable to be used in server-side environments.

On embedded we don't have RTTI. [1]

On mobile [2], yeah. Code size is a concern. And your typical usecase would benefit more from faster startup [3], less bloated icache, less memory usage [4], and smaller install size, then from faster parsing of an occasional server response. [5]

Throughput and code size are in opposition here. We have to pick one or another. From what the community expressed, I'd say we have a bigger appetite for the former.

Maybe there is a world where we can choose from both. And maybe the default should be to favor code size. I'd be happy to see if there are swift serialization packages that optimize for code size at all costs. I think they are likely to bring a lot of value to the table.


I've spent way too much time trying to decipher the "I wonder if a better approach is possible" without additional context. I hate myself


  1. To be fair, the macro can generate all of the required metadata. But it isn't O(1) then. ↩︎

  2. and web via wasm ↩︎

  3. Assuming you don't parse configuration files in a critical path ↩︎

  4. Scratch that. This isn't guaranteed. You are much more likely to do fragmented heap allocations in this scenario ↩︎

  5. Speculation: this proposal may lead to bigger code size than codable, since it permits better monomorphisation and inlining opportunities. -Os should outline all of the costs of monomorphised code tbf. And also Foundation is in a separate resilience domain ↩︎

1 Like

this type metadata table is already present in the binary

This is my understanding. For example with current Swift I could use Mirror API to enumerate names and values of struct fields, and I could do this via a loop of an O(1) code size. It mimics serialisation (eg I could make JSON using that). I’m not saying that Mirror API in its current form is suitable replacement of Codable, merely illustrating the point that serialisation is possible to do with a chunk of constant code size. And it feels like serialisation done that way could be quick, providing the underlying API to get the names/values is quick.

FYI: This specific case is going to be solved by isolated conformances somewhat: SE-0470: Global-actor isolated conformances

1 Like