I reimplemented Foundation's JSONEncoder and made it 33% faster using compile-time strategy dispatch

I recently explored whether Swift's type system and compiler could eliminate runtime overhead in encoding, and the results were surprising: a 33% performance improvement over Foundation's JSONEncoder using clean, maintainable code.

The Core Idea

Instead of runtime configuration with mutable properties, the encoder's type IS its configuration. The encoder itself is a generic type with with no stored state, and has a single pure function:

(Input) -> throws Output

How it works

The pattern leverages three key Swift features:

1. Protocol Composition via Associated Types

Strategies are composed through protocols with associated type requirements:

public protocol JSONEncodingStrategies: EncodingStrategies {
    associatedtype DateStrategy: DateEncodingStrategy
    associatedtype KeyTransform: KeyTransformStrategy
    associatedtype DataStrategy: DataEncodingStrategy
    associatedtype FloatingPointStrategy: FloatingPointEncodingStrategy
}

Each associated type has its own protocol defining the strategy interface:

public protocol DateEncodingStrategy: Sendable {
    static func encode(_ date: Date) throws -> String
}

2. Concrete Strategy Implementations

Strategies are zero-size types with static methods:

public struct ISO8601DateFormatterStrategy: DateEncodingStrategy {
    public static func encode(_ date: Date) throws -> String {
        date.formatted(.iso8601)
    }
}

3. Generic Encoder with Type-Level Configuration

The encoder is generic over both sink (output format) and strategies:

public struct StaticJSONEncoder<
    Sink: JSONSink, 
    Strategies: JSONEncodingStrategies
>: Sendable {
    
    public init() {}
    
    @inlinable
    public func encode<T: Encodable>(_ value: T) throws -> Sink.Output {
        try Self.encode(value)
    }
    // ... implementation
}

Usage

Compose an encoder by specifying types:

let encoder = StaticJSONEncoder<JSONDataSink, StandardJSONEncodingStrategies>()
let data = try encoder.encode(myModel)

Different strategy combinations create different types, each fully specialized by the compiler.


The pattern enables aggressive compiler optimization:

  1. All associated types resolved at compile time - the compiler knows the exact implementation for every strategy method
  2. Static dispatch - no protocol witnesses or vtable lookups, just direct function calls
  3. Full inlining - @inlinable allows cross-module optimization, the entire encoding pipeline gets inlined
  4. Specialized overloads - type-specific fast paths for primitives (String, Int, Bool, etc.) eliminate dynamic casts
  5. Single-pass architecture - direct buffer writing vs Foundation's two-phase (tree construction β†’ serialization)

I wrote a simple Benchmark: 1 million encodings of a simple model

  • Foundation: 3.36 seconds
  • StaticJSONEncoder: 2.26 seconds
  • 33% improvement

Sure, there are some trade-offs:

  • The strategies are fixed at runtime. This is rarely a bummer, though.
  • Not sure about code bloat, but the resulting code is quite minimal.
  • possibly longer compile times

Where is the code?!

The implementation with detailed documentation and benchmarks is available here (not yet a proper package): Static JSON Encoder Β· GitHub

CAUTION: this implementation in the Gist above is buggy regarding the encoding algorithm. I already fixed it, elsewhere but had to increase the complexity of the implementation. Thus, the performance got a little worse, still faster than Foundation, though. The added complexity has a to do with the given API which is not that ideal: the underlying Encoder does not send events when a container is finished. In order to know when to output a closing bracket for example, the implementation needs to track the container hierarchy, basically via using a stack state. That is minuscule code, but it shows - because a few CPU cycle still add up in this implementation.

Questions for the community:

  1. Have you encountered patterns where compile-time configuration significantly improved performance?
  2. Are there other Foundation APIs where this pattern could be beneficial?
  3. What are your thoughts on the compile-time vs runtime flexibility trade-off?
7 Likes

I wonder what percentage of the speedup is due to compile time configuring and what is due to other differences between your implementation and Foundation's. Ideally, to know that, I'd make a branch that removes the compile-type angle while leaving anything else as is, and compare that version to Foundation's JSONEncoder.

There's nothing special with the "other" part. It's probably not bad, but there are no tricks, like utilising caches, or specialised encoders for string or float values. Not even unsafe code.

I made this in a couple hours, so there could be bugs, which when fixed might worsen the performance. But for now, it's the other way around, I found some improvements, and I am now roughly 50% faster than Foundation. Xcode's Profiler is really fantastic! But, it's difficult to spot any code of my implementation - it's basically a few inlined instructions and the rest are calls into libSwiftCore.lib. :)

Also, it's difficult to make these changes, it's really baked into the design.


I have to add:

Like every performance critical implementation, this one also got several improvements:

Performance Optimizations: from 9.3s β†’ 1.78s (47% faster than Foundation)
Benchmark: 1 million JSON encodings of a simple model with primitives and nested structures.

Starting Point: 9.3 seconds
Initial correct implementation, 2.8x slower than Foundation's 3.36s.

Seven Optimization Steps

  1. Buffer Optimization β†’ 7.2s (-23%)
    β€’ Changed from Data to [UInt8] buffer
    β€’ Zero-copy UTF-8 with withContiguousStorageIfAvailable
    β€’ Pre-allocated 4096 bytes capacity
  2. Fast-Path String Escaping β†’ 5.0s (-31% cumulative)
    β€’ Check if escaping needed first
    β€’ Bulk append when no special characters (common case)
    β€’ Character-by-character only when required
  3. Comma Management Simplification (state management) β†’ 3.31s (-46% cumulative)
    Biggest win - Always append comma, remove trailing comma on container close
    β€’ Eliminated needsComma: [Bool] array
    β€’ Branchless hot path, better CPU prediction
  4. Type Check Reordering (if then else reordering) β†’ 3.31s (maintained)
    β€’ Check primitives first (80-90% of values)
    β€’ Better branch prediction
  5. Specialized Overloads β†’ 2.48s (-26%)
    Second biggest win - Added specialized methods for all common types
    β€’ String, Int, Bool, Double, Date, Data, URL, etc.
    β€’ Zero dynamic casts, direct compilation path
    β€’ Compiler inlines everything
  6. Remove Redundant codingPath β†’ 2.26s (-9%)
    β€’ Removed array operations from primitive overloads
    β€’ Primitives never trigger nested encoding
  7. Other β†’ 1.78s (-21%)
    β€’ Removed Date/Data/URL checks from generic paths (accidentally left in)
    β€’ Specialized overloads handle them directly
    β€’ Cleaner fallback for custom types

It’s worth pointing out that you lose a major feature of Foundation.JSONSerialization this way: the ability for the OS to ship security fixes and have them apply in every app.

3 Likes

Ok, this is generally a fair consideration. JSON parsers are generally not free from security risks.

But: here, the risks are minimal.

I do get all security updates for potentially flaky Unicode conversions, potential invalid memory accesses in Data or Array, potential bugs for primitive data types to string conversions, etc.

This is what the implementation is using from the stdlibs, from Foundation and from the Swift language. There's no unsafe code. There might be bugs, I agree - but please, make a thorough assessment of the code, and let me know where the security risks are.

I now have a better answer. Used Joakim Hassila's benchmark tools, which counts allocations and other metrics.

The result is, that Foundation uses more allocations, and much more instructions. Even when assuming Foundation would only use as many allocations as the Static Encoder, it would still be much slower. I think, the implementation which uses a two-pass via an internal intermediate enum representation, does cost a lot. BUT the major part is the better optimisation opportunities by the compiler due to 100% visibility of the static functions.

Benchmark Results

Foundation JSON Encoder performance

Metric p0 p25 p50 p75 p90 p99 p100 Samples
Instructions (M) * 5469 5469 5469 5469 5469 5469 5469 3
Malloc (total) (K) * 2000 2000 2000 2000 2000 2000 2000 3
Memory (resident peak) (M) 12 12 12 12 12 12 12 3
Throughput (# / s) (#) 3 3 3 3 3 3 3 3
Time (total CPU) (ms) * 358 358 359 360 360 360 360 3
Time (wall clock) (ms) * 358 358 359 360 360 360 360 3

StaticJSONEncoder performance

Metric p0 p25 p50 p75 p90 p99 p100 Samples
Instructions (M) * 3164 3165 3165 3165 3166 3166 3166 5
Malloc (total) (K) * 1200 1200 1200 1200 1200 1200 1200 5
Memory (resident peak) (M) 12 12 12 12 12 12 12 5
Throughput (# / s) (#) 5 5 5 5 4 4 4 5
Time (total CPU) (ms) * 210 212 212 213 217 217 217 5
Time (wall clock) (ms) * 210 211 212 213 225 225 225 5

Summary

  • Speed improvement: 41% faster (212ms vs 359ms median)
  • Instruction reduction: 42% fewer (3165M vs 5469M)
  • Allocation reduction: 40% fewer (1200K vs 2000K)

One allocation cost: roughly 50 ns

You can do something like this:

struct G<T: P>: P { init(_: T) {} }
struct H<T: P>: P { init(_: T) {} }
struct C: P {}

func wrapG(_ p: some P) -> any P {
  return G(p)
}

func wrapH(_ p: some P) -> any P {
  return H(p)
}

func makeC() -> any P {
  return C()
}

func sink(_: some P) {}

Now, you can form any arbitrary combination of applications of G<> and H<> to C dynamically, as an any P existential, and pass the result to sink(), which unwraps the existential back to your zero sized value.

However, this has the major caveat that the resulting generic code is unspecialized, so it passes witness tables around. It is still more efficient than just erasing everything though, because only the outermost value is wrapped in an existential, and this value is zero sized.

3 Likes

This is a nice idea to apply a change to a configuration dynamically. I would not use this for the JSON Encoder itself however, for example to configure the "Sink" for pretty printing, because this might really decrease the performance a lot. The primary idea is to give the compiler a clear picture about the "system" at compile time and let it unleash its power. I figured, the compiler is highly "sensitive" about this, what basically means, if you give it the right input, it can optimise the hell out of it - which I really appreciate (thanks and kudos to the compiler builders) ;)

This idea though can be used at a higher level, where the whole JSON Encoder could be replaced. The only function that will be dynamically dispatched then is the top level encode<T: Encodable>(_ value: T). The overhead will be minor, but the encoder internally, will be completely defined with static dispatch, no mutable state, properties and functions are static, and everything is known at compile time.

1 Like