Improving JSONDecoder/Encoder Performance for large apps

Table of Contents

  1. Purpose

  2. JSONDecoder/Encoder Performance Problem

  3. JSONDecoder Performance Flaws

  4. Proposed Optimizations

  5. Optimizations Results

  6. Apple Benchmark Overview

  7. Apple Benchmark Flaws

  8. My Benchmark

JSONDecoder/Encoder Performance Problem

Introduction

swift_conformsToProtocolMaybeInstantiateSuperclasses method is slow, because it traverses all protocol-conformance-descriptors in whole app when gets called first time for pair (class/enum/struct, protocol).

EmergeTools have great article about poor performance of swift_conformsToProtocolMaybeInstantiateSuperclasses.

Briefly, the more protocol-conformance your app has, the slower is swift_conformsToProtocolMaybeInstantiateSuperclasses. Our app has more than 150k of protocol conformances. It can be easily measured using this bash one-liner.

otool -l path/to/your/binary | grep '__swift5_proto$' -A 5 | grep 'size' | awk '/size/ { hex = $2; sub("0x", "", hex); print int("0x" hex)/4 + 0 }'

We take size of __swift5_proto section and divide it by 4 (4-byte integer offsets are stored here).

When swift_conformsToProtocol is called

In short, there are 3 ways to trigger this method:

  • T.self is SomeProtocol.Type
  • as?/as!/as (in switch statement) SomeProtocol
  • Generic-classes with type-generic-constraints
    • swift_conformsToProtocol is triggered because class metadata contains GenericParameterVector. And GenericParameterVector has to contain protocol-witness-tables for each protocol that generic parameter conforms.

JSONDecoder Performance Flaws

unwrap function

The first place in JSONDecoder where swift_conformsToProtocolMaybeInstantiateSuperclasses is used is unwrap function

func unwrap<T: Decodable>(_ mapValue: JSONMap.Value, as type: T.Type, for codingPathNode: _CodingPathNode, _ additionalKey: (some CodingKey)? = nil) throws -> T {
    ...
    if T.self is _JSONStringDictionaryDecodableMarker.Type {
        return try self.unwrapDictionary(from: mapValue, as: type, for: codingPathNode, additionalKey)
    }
    ...
}

KeyedDecodingContainer

KeyedDecodingContainer has type-generic-constraint: K: CodingKey. It is the second place where swift_conformsToProtocol gets called.

JSONDecoder swift_conformsToProtocol Performance Impact

swift_conformsToProtocol consumes at least 84% of all JSONDecoder.decode time in our app startup scenario.

JSONEncoder Performance Flaws

wrapGeneric function

The first place in JSONEncoder where swift_conformsToProtocolMaybeInstantiateSuperclasses is used is wrapGeneric function

func wrapGeneric<T: Encodable>(_ value: T, for additionalKey: (some CodingKey)? = _CodingKey?.none) throws -> JSONEncoderValue? {
    ...
    else if let encodable = value as? _JSONStringDictionaryEncodableMarker {
        return try self.wrap(encodable as! [String:Encodable], for: additionalKey)
    } else if let array = value as? _JSONDirectArrayEncodable {
        ...
    }
    ...
}

KeyedEncodingContainer

KeyedEncodingContainer has type-generic-constraint: K: CodingKey. It is the second place where swift_conformsToProtocol gets called.

JSONEncoder swift_conformsToProtocol Performance Impact

swift_conformsToProtocol consumes at least 84% of all JSONEncoder.encode time in out app startup scenario.

Proposed Optimizations

Firstly ABI/API break-free optimizations will be covered:

#1 JSONDecoder unwrap optimization

_JSONStringDictionaryDecodableMarker is used to make String-keyed Dictionaries exempt from key conversion. So if there is no key-conversion we can skip this slow check:

switch options.keyDecodingStrategy {
  case .useDefaultKeys:    
    break
  case .convertFromSnakeCase, .custom:    
    if T.self is _JSONStringDictionaryDecodableMarker.Type {        
       return try unwrapDictionary(...)    
    }
}

return try self.with(value: mapValue, path: codingPathNode.appending(additionalKey)) {
    try type.init(from: self)
}

instead of

if T.self is _JSONStringDictionaryDecodableMarker.Type {
    return try self.unwrapDictionary(from: mapValue, as: type, for: codingPathNode, additionalKey)
}

return try self.with(value: mapValue, path: codingPathNode.appending(additionalKey)) {
    try type.init(from: self)
}

So this optimization is suitable only for .useDefaultKeys strategy.

#2 JSONEncoder wrapGeneric optimization

There are two ways to attempt optimization of this function.

  • If we believe that as? _JSONDirectArrayEncodable deals more benefit than harm to performance (at least in our app and in this benchmark it does more harm), then we will optimize only _JSONStringDictionaryEncodableMarker check the same way we did it for JSONDecoder and _JSONStringDictionaryDecodableMarker
  • If not it's better to remove as? _JSONDirectArrayEncodable check at all

Here is _JSONStringDictionaryEncodableMarker check optimization:

switch options.keyEncodingStrategy {
  case .useDefaultKeys:    
    break
  case .convertToSnakeCase, .custom:    
    if let encodable = value as? _JSONStringDictionaryEncodableMarker {        
      return try wrap(encodable as! [String: Encodable], for: additionalKey) 
    }
}

So this optimization is suitable only for .useDefaultKeys strategy.

Optimization #1 and #2 are implemented in FastCoders library.

#3 Possibly ABI/API breaking optimizations

So here we will try to solve performance issue with KeyedDecodingContainer and KeyedEncodingContainer type-generic-constraints.

The problem is not about calling KeyedDecodingContainer or KeyedEncodingContainer init, it is about referencing type with specified generic-type:

For example, take this code:

import Foundation

struct A: Codable {
    let a: Int
}

Its init(from: Decoder) throws method SIL has line like

%5 = alloc_stack [lexical] [var_decl] $KeyedDecodingContainer<A.CodingKeys>, scope 22 

And its IR is:

  %4 = call ptr @__swift_instantiateConcreteTypeFromMangledName(ptr @"demangling cache variable for type metadata for Swift.KeyedDecodingContainer<output.A.(CodingKeys in _60494E8B9C642A7C4A26F3A3B6CECEB9)>") #2, !dbg !194

Internally __swift_instantiateConcreteTypeFromMangledName triggers swift_conformsToProtocol in this scenario.

So we mention type KeyedDecodingContainer with specific type A.CodingKeys.

func encode(to: Encoder) throws has the same flaw.

There are two possible ways to tackle them:

  • Change KeyedDecodingContainer and KeyedEncodingContainer type signature to avoid type generic constraints (wasn't implemented in this repository)
  • Use the same CodingKey in Codable/Decodable/Encodable conformance auto-generated code. For example, String.

#3.1 Changing type signature

So the trick is to get rid of K: CodingKey type-generic-constraint in type-declaration and move it to extension. So there will be no need for GenericParameterVector to contain protocol-witness-table and there will be no swift_conformsToProtocol call when generic-type is mentioned or instantiated.

Before:

public struct KeyedDecodingContainer<K: CodingKey> :
  KeyedDecodingContainerProtocol
{
  public typealias Key = K

  /// The container for the concrete decoder.
  internal var _box: _KeyedDecodingContainerBase

  /// Creates a new instance with the given container.
  ///
  /// - parameter container: The container to hold.
  public init<Container: KeyedDecodingContainerProtocol>(
    _ container: Container
  ) where Container.Key == Key {
    _box = _KeyedDecodingContainerBox(container)
  }

  /// The path of coding keys taken to get to this point in decoding.
  public var codingPath: [any CodingKey] {
    return _box.codingPath
  }

  // continue to conform to KeyedDecodingContainerProtocol protocol
  ...
}

After:

public struct KeyedDecodingContainer<K>
{
  /// The container for the concrete decoder.
  internal var _box: _KeyedDecodingContainerBase

  /// Creates a new instance with the given container.
  ///
  /// - parameter container: The container to hold.
  public init<Container: KeyedDecodingContainerProtocol>(
    _ container: Container
  ) where Container.Key == Key {
    _box = _KeyedDecodingContainerBox(container)
  }
}

extension KeyedDecodingContainer: KeyedDecodingContainerProtocol where K: CodingKey {
  public typealias Key = K

  /// The path of coding keys taken to get to this point in decoding.
  public var codingPath: [any CodingKey] {
    return _box.codingPath
  }

  // continue to conform to KeyedDecodingContainerProtocol protocol
  ...
}

Same trick can be applied to KeyedEncodingContainer.

Note: despite _KeyedDecodingContainerBox has type-generic-constraint it seems like we can avoid rewriting code to avoid it because of the way it gets called:

public init<Container: KeyedDecodingContainerProtocol>(
    _ container: Container
) where Container.Key == Key {
    _box = _KeyedDecodingContainerBox(container)
}

In this scenario, in IR-code there is reference to protocol-witness-table of Container implementing KeyedDecodingContainerProtocol:

define protected swiftcc ptr @"output.KeyedDecodingContainerV2.init<A where A == A1.Key, A1: Swift.KeyedDecodingContainerProtocol>(A1) -> output.KeyedDecodingContainerV2<A>"(ptr noalias %0, ptr %K, ptr %Container, ptr %Container.KeyedDecodingContainerProtocol) #0 !dbg !84

and there is no __swift_instantiateConcreteTypeFromMangledName call.

#3.2 Use String as CodingKey

Why this would be faster:

  • swift_conformsToProtocol works slowly only when it gets called for the first time for each (class/enum/struct, protocol) pair.
  • So if we will use String as CodingKey, swift_conformsToProtocol will be called with the same types: String and CodingKey
  • And only first call will be slow. All subsequent calls are going to be much-much faster, because ConcurrentReadableHashMap is used for caching in swift_conformsToProtocol.
How String can conform CodingKey
extension String: CodingKey {    
  public init?(stringValue: String) { 
    self = stringValue 
  }    
  public init?(intValue: Int) { nil }    
  public var intValue: Int? { nil }    
  public var stringValue: String { 
    self 
  }
}
How this can be implemented

We can introduce experimental flag. When flag is enabled, we don't auto-generate enum CodingKeys for our struct/enum and use raw String as CodingKeys in init(from: Decoder) throws and encode(to: Encoder) throws.

Additional advantages

Each auto-generated enum CodingKeys adds 5 protocol-conformance-descriptors. godbolt:

  • CodingKey
  • Hashable
  • Equatable
  • CustomDebugStringConvertible
  • CustomStringConvertible

Also, it each CodingKey adds around 1.8 kb to app size (measured on the same 10k Codable structures):

  • codable-benchmark-package-no-coding-keys - where String is used as CodingKey but there are CodingKeys to match __swift5_proto section size
    • 49 mb
  • codable-benchmark-package-no-coding-keys-measure-size - where String is used as CodingKey and there are no CodingKeys
    • 31.1 mb
  • So each CodingKey adds around 1.8 kb to application binary size.

So if shared CodingKey is implemented we could:

  • Optimize application size
  • Optimize overall application performance due to boosting swift_conformsToProtocol method by __swift5_proto section size reduction.
    • codable-benchmark-package-no-coding-keys has 70321 protocol conformance descriptos
    • codable-benchmark-package-no-coding-keys-measure-size has only 20321 protocol conformance descriptos

Optimizations results

Measurements in our app

In our app we applied only JSONDecoder.unwrap and JSONEncoder.wrapGeneric optimizations without using String as CodingKeys.

We've measured all JSONDecoder.decode and JSONEncoder.encode durations and added them together.

We have 80k measurements from different devices. ~40k with optimized JSONDecoder and JSONEncoder and ~40k with standard JSONDecoder and JSONEncoder with duration logging.

quantile 0.1 0.25 0.5 0.75 0.9
standard JSONDecoder 198 ms 282 ms 422 ms 667 ms 1017 ms
optimized JSONDecoder 100 ms 133 ms 200 ms 322 ms 528 ms
Difference ↑49.5% ↑52.8% ↑52.6% ↑51.7% ↑48.1%

And for JSONEncoder:

quantile 0.1 0.25 0.5 0.75 0.9
standard JSONEncoder 59 ms 94 ms 159 ms 289 ms 547 ms
optimized JSONEncoder 14 ms 30 ms 73 ms 135 ms 220 ms
Difference ↑76% ↑68% ↑54% ↑53.2% ↑59.8%

Briefly, new JSONDecoder became as twice as fast as standard JSONDecoder and JSONEncoder is at least twice as fast as standard JSONEncoder.

My benchmark measurements

I've implemented my own benchmark for JSONDecoder/Encoder: GitHub - ChrisBenua/JSONDecoderEncoderBenchmarks: Illustrating high overhead in JSONDecoder/Encoder and Codable implementation

JSONDecoder

In this benchmark I've measured performance in 4 variations:

  • standard JSONDecoder
  • standard JSONDecoder + String as CodingKey
  • optimized JSONDecoder
  • optimized JSONDecoder + String as CodingKey
quantile 0.25 0.5 0.75
standard JSONDecoder 5.81 s 5.826 s 5.86 s
standard JSONDecoder + String as CodingKey 3.24 s (↑44%) 3.26 s (↑44%) 3.29 s (↑43.9%)
optimized JSONDecoder 2.64 s (↑55%) 2.65 s (↑55%) 2.66 s (↑54.6%)
optimized JSONDecoder + String as CodingKey 0.113 s (↑98%) 0.114 s (↑98%) 0.116 s (↑98%)
JSONEncoder

In this benchmark I've measured performance in 4 variations:

  • standard JSONEncoder
  • standard JSONEncoder + String as CodingKey
  • optimized JSONEncoder
  • optimized JSONEncoder + String as CodingKey
quantile 0.25 0.5 0.75
standard JSONEncoder 8.06 s 8.08 s 8.12 s
standard JSONEncoder + String as CodingKey 5.49 s (↑32%) 5.52 s (↑32%) 5.55 s (↑32%)
optimized JSONEncoder 2.67 s (↑67%) 2.68 s (↑67%) 2.69 s (↑67%)
optimized JSONEncoder + String as CodingKey 0.148 s (↑98.1%) 0.149 s (↑98.2%) 0.151 s (↑98.1%)

My benchmark illustrates how big Swift Runtime slows down JSONDecoder and JSONEncoder.

Apple Benchmark

Swift-foundation repository has some JSONDecoder/Encoder benchmarking logic: JSONBenchmark.swift.

Apple Benchmark Flaws

  • It decodes/encode the same models for 1 bln times without relaunching app
    • This way all swift_conformsToProtocol overhead is disguised, because swift_conformsToProtocol is slow only on first iteration.
    • Small binary size and small __swift5_proto section

My benchmark

Structure

  • Library FastCoders contains optimized realizations of JSONDecoder/JSONEncoder
  • RegularModels contains 10k Codable models with standard Codable implementation. These 10k Codable models can be semantically splitted to 2.5k groups of 4.
  • StringCodingKeyModels contains same 10k Codable models with manually implemented Codable with String as CodingKey
  • codable-benchmark-package - target where 2.5k decodings and encodings of RegularModels duration is measured
  • codable-benchmark-package-no-coding-keys - target where 2.5k decodings and encodings of StringCodingKeyModels duration is measured.
  • codable-benchmark-package and codable-benchmark-package-no-coding-keys use A1_Hierarchy.json file for decoding. Its size is only 319 bytes.

Notes:

  • To match size of __swift5_proto in codable-benchmark-package-no-coding-keys match size of __swift5_proto in codable-benchmark-package I've generated CodingKeys enum in each class but it is not used in encode(to: Encoder) or decode(from: Decoder).

Building

Use ./build.sh for building and stripping codable-benchmark-package and codable-benchmark-package-no-coding-key.

Checking __swift5_proto size

To get amount of protocol-conformance-descriptors in binary use this script:

  • otool -l .build/arm64-apple-macosx/release/codable-benchmark-package | grep '__swift5_proto$' -A 5 | grep 'size' | awk '/size/ { hex = $2; sub("0x", "", hex); print int("0x" hex)/4 + 0 }' outputs 70320.
  • otool -l .build/arm64-apple-macosx/release/codable-benchmark-package-no-coding-keys | grep '__swift5_proto$' -A 5 | grep 'size' | awk '/size/ { hex = $2; sub("0x", "", hex); print int("0x" hex)/4 + 0 }' outputs 70321.
  • So in case of swift_conformsToProtocol performance both binaries are pretty similar.

Running

  • codable-benchmark-package and codable-benchmark-package-no-coding-key has 4 modes:
    • decode - measures decoding using standard JSONDecoder
    • decode_new - measure decoding using optimized JSONDecoder
    • encode - measure encoding using standard JSONEncoder
    • encode_new - measure encoding using standard JSONEncoder

I've used run_bench.py script to run binary for each mode. It measures each binary and each mode 100 times. It takes a while to run. You can easiliy adjust amount of repetitions in run_bench.py.

26 Likes

Also I've created issue in swift-foundation repository

3 Likes

Hello!

Thank you for these amazing contributions. I’ll definitely respond to your PR as I’m always eager to see additional performance optimizations in this space.

I think optimizations #1 and #2 will be pretty straightforward to take. I’m a little reticent about both #3.1 and #3.2 however. At this stage, I don’t think there’s any way to reconcile the ABI compatibility break caused by #3.1.

RE: #3.2, I’m also quite sympathetic to the hidden (writable!) DATA region cost incurred by the 5 protocol conformances created for each CodingKey type. I would hope that the language could be more conservative with that resource for something so common. However, A) adding a public conformance of CodingKey to String could conflict with other such conformances in existing code (despite the encouragement NOT to do so), and B) for structural types, this defeats one of the main purposes of the CodingKey to begin with, which is making it harder to write incorrect codes through the use of strong typing.

As I’m sure you know, the compiler allows you to do piecemeal usage of the synthesized Codable implementation. In particular, you can write custom init(from:)and/or encode(to:) implementations that reference the synthesized CodingKeys type. If String is used as the CodingKey, then the compiler will accept any String value in custom implementations, opening the door for potential mistakes. Perhaps this approach could be used if the compiler detects that it is synthesizing the entire implementation, in which case the strong typing is irrelevant. I’d also be more amenable if there was some way to opt-in per Codable instead of per-module as your flag approach implies. Or maybe there’s some way we could convince the compiler to treat a synthesized CodingKey, String-RawRepresentable type as completely transparent, where the type essentially disappears entirely at runtime and Strings are used directly instead (hand-waving quite a bit here).

2 Likes

You could also look at GitHub - michaeleisel/ZippyJSON: A much faster version of JSONDecoder to see if there are any other optimizations you can take, or even if you want to fork it and apply all your enhancements directly. Modern JSONDecoder is about as fast, but wonder if there's a way to combine the newer Decoder implementation of JSONDecoder with the simd string parsing of ZippyJSONDecoder for an even faster implementation. No encoder support though.

Hello, Kevin!

Thank you for your response! I'm glad to hear that my small performance research is valuable!

Sure, #3.1 and #3.2 are quite risky changes. You're absolutely right about adding CodingKey conformance to String! Maybe we can introduce some struct like AnyCodingKey?

 struct AnyCodingKey: CodingKey {
    let stringValue: String
    var intValue: Int? { nil }

    init?(stringValue: String) {
        self.stringValue = stringValue
    }

    init?(intValue: Int) {
        nil
    }
}

Thanks for pointing out about using only one piece of automatically generated Codable conformance code. In this case, no doubts, we should stick with enum-like CodingKeys. But if the whole implementation is autogenerated by compiler, we can stick with AnyCodingKey or String.

I see, there is lots of obstacles in implementing third optimization. Maybe I should write some Swift Macro for this, if there is no other way?

2 Likes

If we were to go with this kind of approach, yeah, I’d absolutely suggest a shared String-backed CodingKey like this.

It’s a possible approach if the entire implementation is autogenerated, potentially even without a flag, but…

… ultimately I hesitate to make any large changes to the compiler synthesis side of Codable. Ultimately exciting new things in this space are coming down the pike which specifically target resolving the issues identified and tackled here (and more!) which will diminish the value of big changes in present-day Codable.

1 Like

Hello, Jon!

Thanks for mentioning ZippyJSONDecoder!

I've made small dive into ZippyJSONDecoder and here is what I've found:

Surely, ZippyJSONDecoder is faster in case of parsing, but still struggles from the same overhead from Swift Runtime when using casts and type-generic-constraints.

There is check for protocol-conformance

And it also uses KeyedDecodingContainer (because we must use it when we subclass JSONDecoder)

So in case of Swift Runtime overhead standard JSONDecoder and ZippyJSONDecoder solutions are pretty similar.

In case if @michaeleisel still supports this repo, I can slightly improve ZippyJSONDecoder performance

2 Likes

Are you aware that dyld has been generating a Swift type/protocol conformance cache for apps starting from iOS 16 (and equivalent versions on other Apple platforms) which greatly reduce the cost of conformsToProtocol calls?

Sure, I've read EmergeTools article about that Emerge Tools Blog | How iOS 16 makes your app launch faster.

But my team and I conducted AB-test in production. 98% of our users already use iOS 16 or newer iOS versions. And still we get massive improvement in JSONDecoder/JSONEncoder speed.

Here are my thought on why this optimization does not work here.

Lets see how _JSONStringDictionaryDecodableMarkerType is introduced:

private protocol _JSONStringDictionaryDecodableMarker {
    static var elementType: Decodable.Type { get }
}

extension Dictionary : _JSONStringDictionaryDecodableMarker where Key == String, Value: Decodable {
    static var elementType: Decodable.Type { return Value.self }
}

It adds exactly one protocol conformance descriptor to binary. So when we check whether [String: T] where T: Decodable conforms to _JSONStringDictionaryDecodableMarker we are trying to find protocol conformance descriptor of [String: T] to _JSONStringDictionaryDecodableMarker but dyld cache contains protocol conformance descriptor of [String: Decodable] conforming to _JSONStringDictionaryDecodableMarker. And this check fails dyld/dyld/DyldAPIs.cpp at c8a445f88f9fc1713db34674e79b00e30723e79d Β· apple-oss-distributions/dyld Β· GitHub

I could be wrong but I guess that we don't use dyld cache in this case at all.

3 Likes

Just saw another new JSONDecoder announced: GitHub - reers/ReerJSON: A faster version of JSONDecoder based on yyjson, which seems to be even faster than ZippyJSON (which is still faster than Foundation).

And this implementation struggles from the same thing as Foundation.JSONDecoder and ZippyJSONDecoder.

There is similar check that triggers swift_conformsToProtocol ReerJSON/Sources/ReerJSON/JSONDecoderImpl.swift at main Β· reers/ReerJSON Β· GitHub

I'll create another PR here.

Thanks for spotting new JSONDecoder implementation!