This is a proposal for a canonical and multi-platform compression library for the Swift ecosystem.
Motivation
In the Swift ecosystem we’re currently missing a central place for compression, a place that’s agnostic of algorithm and type and that can run on any platform where Swift runs. There are several libraries that do provide compression, such as:
- adam-fowler/compress-nio - zlib compression for NIO
ByteBuffers - apple/swift-nio-extras - zlib, also for
ByteBuffers - vapor-community/Zip - zipping via zlib
and there’s many more but none of them cover the matrix of platforms, types and algorithms that the ecosystem needs. This proposal aims to fill that gap with a library that the majority of Swift users can reach for, independently of their platform or stack.
Goals
- Single, cross-platform compression package
- Canonical, ecosystem wide package for vending C-based compression libraries
- Provide modern Swift APIs for compression
- Offer pluggable algorithms for extensibility
- Minimal dependencies including only paying for the algorithms that are needed
- Safe APIs and strict memory checking under the hood
High Level Approach
Package Structure
The package would be split into several targets:
CompressionCore: the core module. Defines the top level APIs and protocols that allow algorithms to be used with said APIs. ProvidesSpanand[UInt8]compression APIs.Czlib,Czstd,Cbrotli,CLZ4: C shims that vendor the respective libraries.CompressionZlib,CompressionZstd,CompressionBrotli,CompressionLZ4: targets that pull in their respective C shim and provide APIs conforming to the protocols inCompressionCore. Single targets allow conditional compilation: the user won’t pay for what they don’t use.Compression: umbrella target that pulls in all available algorithms.CompressionNIO: addsByteBuffersupport. Allows for conditional compilation of NIO.CompressionFoundation: addsDatasupport. Allows for conditional compilation of Foundation.
This allows for a lot of customisation as users can choose whether to prioritise simplicity, binary size and/or platform support.
API Ideas
Following are some rough sketches for the API, they’re not final by any means.
Core
The core could look something like this:
public protocol CompressionAlgorithm {
associatedtype Configuration: CompressionConfiguration
associatedtype Compressor: Core.Compressor where Compressor.Configuration == Configuration
associatedtype Decompressor: Core.Decompressor where Decompressor.Configuration == Configuration
}
public protocol CompressionConfiguration {}
public protocol Compressor: Sendable {
associatedtype Configuration: CompressionConfiguration
var configuration: Configuration { get }
func compress(_ input: some CompressibleInput) throws(CompressionError) -> [UInt8]
}
And, for each algorithm we want to implement, we could provide conformances for these in the respective target:
import Core
import CZlib
public enum Deflate: CompressionAlgorithm {}
extension Deflate {
struct Configuration: CompressionConfiguration {
let level: Level
var `default` { .init(level: .speed) }
}
}
extension Deflate {
public struct Compressor {
public let configuration: Configuration
public func compress(_ input: some CompressibleInput) throws(CompressionError) -> [UInt8] {
try input.withSpan { span in
try compress(span) // here we'd be calling CZlib.deflate(...)
}
}
}
}
The associated Decompressor would be similar.
This would allow usage as follows:
let compressor = Deflate.Compressor(configuration: .default)
let input: [UInt8] = [0x42]
try compressor.compress(input)
Type Extensibility
The compression functions would accept a CompressibleInput that could look something like this:
public protocol CompressibleInput {
func withSpan<R>(_ body: (Span<UInt8>) throws -> R) rethrows -> R
}
then, in the CompressionFoundation target, this would allow for:
import CompressionCore
#if canImport(FoundationEssentials)
import FoundationEssentials
#else
import Foundation
#endif
extension Data: CompressionCore.CompressibleInput {
public func withSpan<R>(_ body: (Span<UInt8>) throws -> R) rethrows -> R {
try body(self.span)
}
}
So that Data, or really any other type that has Span support, can be compressed or decompressed.
Streaming
As for streaming, I think there’s more than one approach we can take.
- One possibility is to make the one shot
Compressora convenience that internally compresses data via a stream and then buffers the output. This is convenient but it might leave some performance on the table. - Another approach could be to create a completely separate
Streaming{De}Compressors to attach to aCompressionAlgorithm. This would allow for more fine grained customisation, though it does imply some code duplication.
I’m personally leaning towards the latter, since we’re creating a package for hiding complexity, however I’m open to suggestions.
In either case, the core could provide something like
public struct CompressionAsyncSequence<
BackingSequence: AsyncSequence,
Algorithm: CompressionAlgorithm
>: AsyncSequence where BackingSequence.Element: CompressibleInput {
let backingSequence: BackingSequence
let compressor: Algorithm.StreamingCompressor // Or Algorithm.Compressor
// ...
}
extension AsyncSequence where Element: CompressibleInput {
public func compressed<Algorithm: CompressionAlgorithm>(
using algorithm: Algorithm.Type,
configuration: Algorithm.Configuration = .default
) -> some AsyncSequence<[UInt8], Error> {
.init(backingSequence: self, configuration: configuration)
}
}
This would allow us to stream-compress like this:
let stream: some AsyncSequence<[UInt8], Error>
let compressedStream = stream.compressed(using: Deflate.self, configuration: .default)
for try await chunk in compressedStream {
// do something with the chunk
}
Streaming would definitely be designed with back-pressure in mind.
Algorithms
Initially, the library would ship with support for
- gzip, zlib and deflate - HTTP legacy compatibility, ubiquitous zlib support
- zstd - modern, performance-focused general purpose compression
- brotli - HTTP content encoding
- lz4 - exceptionally high performance use cases
This is not intended to be an exhaustive list - the extensible design should make it straightforward to add further algorithms over time - but these cover the majority of real-world use cases in the Swift ecosystem today.
Prior Art
The Rust ecosystem provides various crates for compression, each one with a different algorithm and there’s no unified package for them. flate-2 is one of the major players there and they provide different backends based on needs. Their rust rewrite of zlib even outperforms C. They also provide both buffered and streaming APIs.
Java has built-in support for compression, in java.util, providing zlib functionality via De/Inflater and also on ByteArrayOutputStream, all in the standard library. Java’s approach is stream-oriented and the one-shot API is actually the less natural one which is interesting.
There’s also Apache Commons Compress which is closer to what is being proposed here, providing APIs for a battery of different algorithms and archives.
Future Directions
While these things are not part of the initial proposal, they would definitely be interesting to keep in mind for future effort.
Underlying Swift Implementation
As the library matures it could be interesting to explore rewriting the C compression algorithms in pure Swift. The vendored C libraries are a good starting point but in the long term we might want to seek the advantages a Swift codebase brings like better debuggability, type safety and no C interop overhead. Of course this isn’t supposed to be taken lightly as the C compression libraries are security battle tested so we’d have to undergo a proper security review.
C Compatible APIs
Afterwards, if and when we reach or exceed C performance, we can expand and try vending a C compatible API with an underlying Swift implementation. This is essentially what zlib-rs has done. Obviously this would be a big undertaking is therefore not part of the current proposal.