Inlinable Codable

dmt · June 8, 2023, 3:02pm

Motivation
The current implementation of functions related to Codable types lacks the ability to be inlined.
For example:

Encodable func encode(to encoder: Encoder) throws assumes any Encoder.
Decodable init(from decoder: Decoder) throws assumes any Decoder.
Functions of KeyedEncodingContainer, _KeyedEncodingContainerBase and _KeyedEncodingContainerBox, KeyedDecodingContainer, _KeyedDecodingContainerBase, _KeyedDecodingContainerBox aren't marked with @inlinable.
Extensions of builtin types (String, Bool, Int, etc) to Codable aren't inlinable as well.

This leads to sub-optimal code generation by the compiler, resulting in a potential performance bottleneck when encoding and decoding in a hotpath.

Solution
The final goal here is to allow the optimizer to do its job as good as it can. For example, when calling myStruct.encode(to: myEncoder), the resulting code should be free from boxing, jumping back and forth to/from libswiftCore, calls to __swift_instantiateConcreteTypeFromMangledName, etc. We need just direct access to the fields of the struct and static dispatch to the writing methods of the encoder.
To achieve this, we need at least to:

Mark all functions with the @inlinable attribute.
Use generics in init(from decoder: Decoder) throws and func encode(to encoder: Encoder) throws.
Probably need to mark synthesized implementations of init(from:) and encode(to:) with @inlinable as well.

Alternatives considered
Probably the whole Codable system can be reinvented with Macros with more precise control on the user code side, not the stdlib side.

Additional context
GitHub issue

Prototyping
I made two demos.
The first one uses stdlib's Encodable.
The second one uses a copy of Encodable with slightly tweaked signatures.
Both do the same: encode a struct of two fields to an encoder that calls uninlinable function to print args.
As you can see the first one does:

call    (lazy protocol witness table accessor for type output._Encoder and conformance output._Encoder : Swift.Encoder in output)
call    (output.Foo.encode(to: Swift.Encoder) throws -> ())
call    __swift_instantiateConcreteTypeFromMangledName
call    __swift_project_boxed_opaque_existential_1
call    (lazy protocol witness table accessor for type output.Foo.(CodingKeys in _60494E8B9C642A7C4A26F3A3B6CECEB9) and conformance output.Foo.(CodingKeys in _60494E8B9C642A7C4A26F3A3B6CECEB9) : Swift.CodingKey in output)
call    ($ss7EncoderP9container7keyedBys22KeyedEncodingContainerVyqd__Gqd__m_ts9CodingKeyRd__lFTj)@PLT
call    ($ss22KeyedEncodingContainerV6encode_6forKeyySi_xtKF)@PLT
call    (type metadata accessor for output._KeyedEncodingContainer)
call    swift_getWitnessTable@PLT
call    ($ss9CodingKeyP11stringValueSSvgTj)@PLT

And more in stdlib. While the second one:

call    __swift_instantiateConcreteTypeFromMangledName
call    swift_initStackObject@PLT

And then two direct calls to the print function. And I'm not sure why there are calls to __swift_instantiateConcreteTypeFromMangledName and swift_initStackObject here.

bbrk24 · June 8, 2023, 3:19pm

That’s the print statement: Compiler Explorer

dmt · June 8, 2023, 6:08pm

No, I was talking about this piece:

lea     rdi, [rip + (demangling cache variable for type metadata for output._KeyedEncodingContainerBox<output._KeyedEncodingContainer<output.Foo.CodingKeys>>)]
call    __swift_instantiateConcreteTypeFromMangledName
mov     rsi, rsp
mov     rdi, rax
call    swift_initStackObject@PLT

The result seems not to be used.

dmt · June 17, 2023, 8:08pm

I made a benchmark project to compare inlinable version vs non inlinable.

The results are following:

name                                          time           std        iterations
----------------------------------------------------------------------------------
Encode (no inline)                            1054396.000 ns ±   3.25 %       1286
Encode (inline)                                420666.500 ns ±   3.57 %       3292
Decode (no inline)                             820250.000 ns ±   3.20 %       1689
Decode (inline)                                466583.000 ns ±   3.63 %       2972
Decode json with ZippyJSONDecoder (no inline)  246625.000 ns ±   5.70 %       5568
Decode json with ZippyJSONDecoder (inline)     176833.500 ns ±   5.75 %       7798

So there are clear differences between versions. But I also should mention that forcing the compiler to actually inline everything is quite tricky and requires a lot of attention from all parties: stdlib, the encoder/decoder vendor and the consumer. But at least it's possible.
One of the most tricky thing is that I have to keep signatures of the encode(to: Encoder) and init(from: Decoder) to not break ABI. So they are not generic and all of the inlining magic around them is kinda fragile.