I reimplemented Foundation's JSONEncoder and made it 33% faster using compile-time strategy dispatch

couchdeveloper · January 17, 2026, 10:41pm

I recently explored whether Swift's type system and compiler could eliminate runtime overhead in encoding, and the results were surprising: a 33% performance improvement over Foundation's JSONEncoder using clean, maintainable code.

The Core Idea

Instead of runtime configuration with mutable properties, the encoder's type IS its configuration. The encoder itself is a generic type with with no stored state, and has a single pure function:

(Input) -> throws Output

How it works

The pattern leverages three key Swift features:

1. Protocol Composition via Associated Types

Strategies are composed through protocols with associated type requirements:

public protocol JSONEncodingStrategies: EncodingStrategies {
    associatedtype DateStrategy: DateEncodingStrategy
    associatedtype KeyTransform: KeyTransformStrategy
    associatedtype DataStrategy: DataEncodingStrategy
    associatedtype FloatingPointStrategy: FloatingPointEncodingStrategy
}

Each associated type has its own protocol defining the strategy interface:

public protocol DateEncodingStrategy: Sendable {
    static func encode(_ date: Date) throws -> String
}

2. Concrete Strategy Implementations

Strategies are zero-size types with static methods:

public struct ISO8601DateFormatterStrategy: DateEncodingStrategy {
    public static func encode(_ date: Date) throws -> String {
        date.formatted(.iso8601)
    }
}

3. Generic Encoder with Type-Level Configuration

The encoder is generic over both sink (output format) and strategies:

public struct StaticJSONEncoder<
    Sink: JSONSink, 
    Strategies: JSONEncodingStrategies
>: Sendable {
    
    public init() {}
    
    @inlinable
    public func encode<T: Encodable>(_ value: T) throws -> Sink.Output {
        try Self.encode(value)
    }
    // ... implementation
}

Usage

Compose an encoder by specifying types:

let encoder = StaticJSONEncoder<JSONDataSink, StandardJSONEncodingStrategies>()
let data = try encoder.encode(myModel)

Different strategy combinations create different types, each fully specialized by the compiler.

The pattern enables aggressive compiler optimization:

All associated types resolved at compile time - the compiler knows the exact implementation for every strategy method
Static dispatch - no protocol witnesses or vtable lookups, just direct function calls
Full inlining - @inlinable allows cross-module optimization, the entire encoding pipeline gets inlined
Specialized overloads - type-specific fast paths for primitives (String, Int, Bool, etc.) eliminate dynamic casts
Single-pass architecture - direct buffer writing vs Foundation's two-phase (tree construction → serialization)

I wrote a simple Benchmark: 1 million encodings of a simple model

Foundation: 3.36 seconds
StaticJSONEncoder: 2.26 seconds
33% improvement

Sure, there are some trade-offs:

The strategies are fixed at runtime. This is rarely a bummer, though.
Not sure about code bloat, but the resulting code is quite minimal.
possibly longer compile times

Where is the code?!

The implementation with detailed documentation and benchmarks is available here (not yet a proper package): Static JSON Encoder · GitHub

CAUTION: this implementation in the Gist above is buggy regarding the encoding algorithm. I already fixed it, elsewhere but had to increase the complexity of the implementation. Thus, the performance got a little worse, still faster than Foundation, though. The added complexity has a to do with the given API which is not that ideal: the underlying Encoder does not send events when a container is finished. In order to know when to output a closing bracket for example, the implementation needs to track the container hierarchy, basically via using a stack state. That is minuscule code, but it shows - because a few CPU cycle still add up in this implementation.

Questions for the community:

Have you encountered patterns where compile-time configuration significantly improved performance?
Are there other Foundation APIs where this pattern could be beneficial?
What are your thoughts on the compile-time vs runtime flexibility trade-off?

tera · January 17, 2026, 11:48pm

I wonder what percentage of the speedup is due to compile time configuring and what is due to other differences between your implementation and Foundation's. Ideally, to know that, I'd make a branch that removes the compile-type angle while leaving anything else as is, and compare that version to Foundation's JSONEncoder.

couchdeveloper · January 18, 2026, 12:11am

There's nothing special with the "other" part. It's probably not bad, but there are no tricks, like utilising caches, or specialised encoders for string or float values. Not even unsafe code.

I made this in a couple hours, so there could be bugs, which when fixed might worsen the performance. But for now, it's the other way around, I found some improvements, and I am now roughly 50% faster than Foundation. Xcode's Profiler is really fantastic! But, it's difficult to spot any code of my implementation - it's basically a few inlined instructions and the rest are calls into libSwiftCore.lib. :)

Also, it's difficult to make these changes, it's really baked into the design.

I have to add:

Like every performance critical implementation, this one also got several improvements:

Performance Optimizations: from 9.3s → 1.78s (47% faster than Foundation)
Benchmark: 1 million JSON encodings of a simple model with primitives and nested structures.

Starting Point: 9.3 seconds
Initial correct implementation, 2.8x slower than Foundation's 3.36s.

Seven Optimization Steps

Buffer Optimization → 7.2s (-23%)
• Changed from Data to [UInt8] buffer
• Zero-copy UTF-8 with withContiguousStorageIfAvailable
• Pre-allocated 4096 bytes capacity
Fast-Path String Escaping → 5.0s (-31% cumulative)
• Check if escaping needed first
• Bulk append when no special characters (common case)
• Character-by-character only when required
Comma Management Simplification (state management) → 3.31s (-46% cumulative)
Biggest win - Always append comma, remove trailing comma on container close
• Eliminated needsComma: [Bool] array
• Branchless hot path, better CPU prediction
Type Check Reordering (if then else reordering) → 3.31s (maintained)
• Check primitives first (80-90% of values)
• Better branch prediction
Specialized Overloads → 2.48s (-26%)
Second biggest win - Added specialized methods for all common types
• String, Int, Bool, Double, Date, Data, URL, etc.
• Zero dynamic casts, direct compilation path
• Compiler inlines everything
Remove Redundant codingPath → 2.26s (-9%)
• Removed array operations from primitive overloads
• Primitives never trigger nested encoding
Other → 1.78s (-21%)
• Removed Date/Data/URL checks from generic paths (accidentally left in)
• Specialized overloads handle them directly
• Cleaner fallback for custom types

ksluder · January 18, 2026, 1:28am

It’s worth pointing out that you lose a major feature of Foundation.JSONSerialization this way: the ability for the OS to ship security fixes and have them apply in every app.

couchdeveloper · January 18, 2026, 1:40am

Ok, this is generally a fair consideration. JSON parsers are generally not free from security risks.

But: here, the risks are minimal.

I do get all security updates for potentially flaky Unicode conversions, potential invalid memory accesses in Data or Array, potential bugs for primitive data types to string conversions, etc.

This is what the implementation is using from the stdlibs, from Foundation and from the Swift language. There's no unsafe code. There might be bugs, I agree - but please, make a thorough assessment of the code, and let me know where the security risks are.

couchdeveloper · January 18, 2026, 2:19pm

I now have a better answer. Used Joakim Hassila's benchmark tools, which counts allocations and other metrics.

The result is, that Foundation uses more allocations, and much more instructions. Even when assuming Foundation would only use as many allocations as the Static Encoder, it would still be much slower. I think, the implementation which uses a two-pass via an internal intermediate enum representation, does cost a lot. BUT the major part is the better optimisation opportunities by the compiler due to 100% visibility of the static functions.

Benchmark Results

Foundation JSON Encoder performance

Metric	p0	p25	p50	p75	p90	p99	p100	Samples
Instructions (M) *	5469	5469	5469	5469	5469	5469	5469	3
Malloc (total) (K) *	2000	2000	2000	2000	2000	2000	2000	3
Memory (resident peak) (M)	12	12	12	12	12	12	12	3
Throughput (# / s) (#)	3	3	3	3	3	3	3	3
Time (total CPU) (ms) *	358	358	359	360	360	360	360	3
Time (wall clock) (ms) *	358	358	359	360	360	360	360	3

StaticJSONEncoder performance

Metric	p0	p25	p50	p75	p90	p99	p100	Samples
Instructions (M) *	3164	3165	3165	3165	3166	3166	3166	5
Malloc (total) (K) *	1200	1200	1200	1200	1200	1200	1200	5
Memory (resident peak) (M)	12	12	12	12	12	12	12	5
Throughput (# / s) (#)	5	5	5	5	4	4	4	5
Time (total CPU) (ms) *	210	212	212	213	217	217	217	5
Time (wall clock) (ms) *	210	211	212	213	225	225	225	5

Summary

Speed improvement: 41% faster (212ms vs 359ms median)
Instruction reduction: 42% fewer (3165M vs 5469M)
Allocation reduction: 40% fewer (1200K vs 2000K)

One allocation cost: roughly 50 ns

Slava_Pestov · January 18, 2026, 4:18pm

You can do something like this:

struct G<T: P>: P { init(_: T) {} }
struct H<T: P>: P { init(_: T) {} }
struct C: P {}

func wrapG(_ p: some P) -> any P {
  return G(p)
}

func wrapH(_ p: some P) -> any P {
  return H(p)
}

func makeC() -> any P {
  return C()
}

func sink(_: some P) {}

Now, you can form any arbitrary combination of applications of G<> and H<> to C dynamically, as an any P existential, and pass the result to sink(), which unwraps the existential back to your zero sized value.

However, this has the major caveat that the resulting generic code is unspecialized, so it passes witness tables around. It is still more efficient than just erasing everything though, because only the outermost value is wrapped in an existential, and this value is zero sized.

couchdeveloper · January 19, 2026, 7:55am

This is a nice idea to apply a change to a configuration dynamically. I would not use this for the JSON Encoder itself however, for example to configure the "Sink" for pretty printing, because this might really decrease the performance a lot. The primary idea is to give the compiler a clear picture about the "system" at compile time and let it unleash its power. I figured, the compiler is highly "sensitive" about this, what basically means, if you give it the right input, it can optimise the hell out of it - which I really appreciate (thanks and kudos to the compiler builders) ;)

This idea though can be used at a higher level, where the whole JSON Encoder could be replaced. The only function that will be dynamically dispatched then is the top level encode<T: Encodable>(_ value: T). The overhead will be minor, but the encoder internally, will be completely defined with static dispatch, no mutable state, properties and functions are static, and everything is known at compile time.