hi all, as swift-json
nears its third minor release, i just wanted to highlight an important API change coming in version 0.3:
obsoleting json dictionary abstractions
the early versions of swift-json
modeled objects (things written in curly braces {}
) as dictionaries of String
keys and JSON
values:
case array([Self])
case object([String: Self])
this is the most obvious representation for a JSON value, it’s convenient for 99% of use cases, and it dovetails nicely with the JSON.array(_:)
enumeration case.
motivation
as i’ve used swift-json
in more projects, and gotten more feedback from others, it’s becoming apparent this representation has some major downsides:
-
it cannot handle duplicate keys in JSON input. JSON values with duplicate keys are actually valid JSON values, though they’re not very well-supported.
right now,
swift-json
uses the value associated with the last occurrence of the key, or something that satisfiesString.==(_:_:)
with it.that last part is the source of a subtle class of bugs in highly dynamic JSON APIs, because JSON is defined in UTF8, but
String.==(_:_:)
compares grapheme clusters. so if an object vends separate keys for"\u{E9}"
('é'
) and"\u{65}\u{301}"
(also'é'
, perhaps, because the JSON is being used to bootstrap a unicode table), one of the values will be dropped. -
it doesn’t preserve key-value ordering. usually we don’t care about key-value ordering when decoding, but problems crop up when we combine it with serialization. regenerating the same JSON can yield different text even though it encodes the same data, which can cause VCS spam if the datafile is version-controlled.
applications that regularly read and write JSON persistence data are often affected by this issue.
-
it is not efficient. many JSON API clients can be optimized to use no hash table lookups at all. some even implement fast-paths that attempt to decode key-value pairs at constant offsets if the source of the JSON emits it deterministically.
this happens quite often in fintech applications that have to parse “firehose“ JSON. i’m sure there are many other kinds of applications that are (or could be) doing things with JSON that simply aren’t currently feasible with the overhead of
Decodable
, or even[String: JSON]
.being a denser data structure,
[(key:String, value:JSON)]
also experiences slightly less heap fragmentation than[String: JSON]
.
proposed solution
starting with version 0.3, we’re going to change the payload of JSON.object(_:)
from [String: JSON]
to [(key:String, value:JSON)]
. this matches APIs vended by Dictionary
itself.
case object([(key:String, value:Self)])
note that for performance reasons, JSON
is @frozen
, but swift-json
makes no binary stability guarantees (yet).
we’re going to deprecate [String: JSON]
’s callAsFunction(as:)
typecasting overload, but you can still use it for now, and its behavior is still the same.
@available(*, deprecated, message:
"""
handle duplicate keys explicitly with
`callAsFunction(as:uniquingKeysWith:)`
""")
func callAsFunction(as _:[String: Self].Type) -> [String: Self]?
in its place we’ll get an overload that returns [(key:String, value:JSON)]?
, and one that returns a [String: JSON]
, but takes an explicit merging closure.
func callAsFunction(as _:[(key:String, value:Self)].Type)
-> [(key:String, value:Self)]?
func callAsFunction(as _:[String: Self].Type,
uniquingKeysWith combine:(Self, Self) throws -> Self) rethrows
-> [String: Self]?
impact on library users
anyone who is case-switching on .object(_)
will experience source-breakage. swift-json
is still experimental, so we are only going to be bumping the minor release number.
people using swift-json
through its Decoder
/Decodable
interface won’t see any changes, so this mostly affects users with high-performance, high-throughput requirements who are writing their own decoding logic. however, in the long term, these changes should benefit this use case via reduced parser overhead.
alternatives considered
to partially address the unicode aspects of problem #1, we could switch the key representation to [UInt8]
. but we would probably lose more than we gain from this, and it wouldn’t do anything about problems #2 or #3.
to solve problem #2, we could escalate the payload to a data structure like OrderedDictionary
from swift-collections
. however, the overhead imposed by OrderedDictionary
would be unacceptable to performance-sensitive users, and it wouldn’t do anything about problem #1 either. also, people might not want to depend on swift-collections
just to use swift-json
.