I was benchmarking JSONSerialization (both reading and writing) on my iMac, compared with using Python3 and Python's "json" module. The JSON file I'm reading (and then writing) is about 8MB. I'm timing how long it takes to parse, from a string, to either [String : Any] (swift), or dictionary (Python).
On output I'm timing the time to form a string. So I'm not measuring anything to do with file I/O. Just the in-memory computations (serializing, deserializing).
I found that on the whole, the implementation speeds are pretty much the same! Good for you, Swift!
... except when it comes to using the .sortedKeys option. In Python, there's basically no speed hit if you output and request sorted keys. In Swift, the time to produce the String goes from 0.08 seconds to about 0.52 seconds.
So that's more than a 5x speed hit for wanting your keys sorted!
Seems a bit excessive?
I don't suppose anybody knows why this should be or if there's anything I can do about it. I'm seriously toying with spinning up C++ rapidJSON to write a streaming output system which knows what order the keys should be written. (I need sorted keys because successive versions of my output need to be easily diffable, and without sorted keys, that doesn't happen.)
FYI, JSONSerialization, on Apple platforms, is an Obj-C API bridged into Swift. So internally it's all Obj-C box types like NSDictionary, NSString, and NSNumber, and bridging back and forth to Swift will have some cost. I wouldn't be surprised if sortedKeys has a cost either in the internal collections (NSDictionarys aren't sortable) or output. I suggest testing JSONDecoder and JSONEncoder, especially versions on newer Apple platforms that don't use JSONSerialization internally, or the swift-foundation version, which is completely native rewrite that is even faster. For decoding I also suggest looking at ZippyJSON, which uses simdjson under the hood to be much faster, relatively (though Codable sets a rather low upper limit for absolute performance). For encoding, I don't know of many better alternatives. And there is no native Swift solution for lazy or streaming JSON decoding or encoding, which is very unfortunate.
It's generally somewhat sad that Swift's only as fast as Python here.
I am not making any use of Codable here. I specifically need to decode my JSON data into [String : Any]. That being the case, can I still make use of JSONDecoder and JSONEncoder? I didn’t think I could but would be happy to be wrong.
I think I saw the code for the swift-foundation version, but was unable to compile it. (I found two files in what appeared to be the open source repository on github for this, and was studying their implementation. Curiously, I tried using rapidJSON to decode, bridging into Swift, and doing what I thought was about the same pattern the open-source implementation was doing, but it’s 2x slower than just using JSONSerialization…)
So the question is, without having to compile it myself, is there a way to access the swift-foundation version?
If not, can you point me at the swift-foundation version and tell me what I need to do to compile that source and try it myself?
I did not bother to try zippyJSON because it was a drop-in for JSONEncode/Decoer, see (1) above.
Well, I do have what should be swift-foundation available on my Linux system, where I have a swift environment.
Good news: with swift-foundation, adding .sortedKeys only makes it slower by a factor a bit more than 2x.
Bad news: swift-foundation is, on the whole, 30x slower at serializing the data, compared with using the native Apple libraries (I’ve adjusted for the difference in machine speeds).
My toolchain says that swift is version 5.8, so I assume the swift-foundation build I have dates from then as well. I initially ran this experiment with an older version (swift 5.6) but there’s actually no change in the numbers.
Would be curious to see what kind of numbers I get if I could compile the open source Foundation version on an Apple platform, but I think that’s beyond my skill set this second...
Unless you specifically want an Any you can create some sort of general JSON type to decode as a [String: JSON]. Or really any type that has the value representation you want.
Do you mean swift-corelibs-foundation that comes with Swift for Linux? That's not what I'm referring to. What I mean is the new swift-foundation being written as a unified Foundation implementation for all platforms, including Apple's. It should be faster than swift-corelibs-foundation as well.
I would guess my Linux version (and the code I was looking at this morning) is indeed swift-corelibs-foundation.
I was not aware of the newer swift-foundation you mention. Unfortunately, it doesn't seem to have JSONSerialization, just JSONEncoder.
The data that I am dealing with is highly heterogenous; it's basically a random collection of various keys/fields as a nested sequence of dictionaries. It doesn't map onto specific types, so I really have no way of using JSONDecoder or JSONEncoder with it; I can't do any better than decode it, get back a [String : Any], and start playing 20 questions with it...
but thanks for taking the time to look through this. maybe I can file a radar and get someone to check why .sortedKeys gives such a slowdown. Hard to believe that simply grabbing the list of keys, sorting it, and using that as the traversal order would be so slow...
Given you're decoding JSON you can only be dealing with a limited set of types. And if your data is even more limited then a more limited representative value may be possible as well. Whether it will be a performance gain I don't know, but there are various JSON enums out there that could do what you want. You can also look into the JSON coders from swift-foundation, they're more flexible internally and so may be more tunable to your use case.
Thanks, but again I’m hampered by the fact that I’m not working with Codable. The data being read needs to support a set of keys I cannot possibly know in advance. There is no way I can see of simply reading the data back as [String : Any].
I want a replacement for JSONSerialization, not JSONEncoder/JSONDecoder.
The set of types is limited, but the set of keys is unknown at read time. That would seem to preclude any use of Codable, unless I’m completely missing something.
(It’s not that I don’t really know the keys. I know what some of them will be, but if there are extra ones I don’t know about, they have to be carried along as “blind” data, which basically turns back to needing to support [String : Any], for at least some (if not all) of the data read.)
This is to let you parse versions that are “newer” than your oldest build; the oldest build reads what it understands, brings the rest along as blind data, and reoutputs the blind data later on, along with what it understood. It’s a proven strategy for being future compatible for file formats.
Hey @davidbaraff - it's worth looking closer at swift-extras-json, it includes the Codable pieces, yes - but also a JSONParser that will load up the content into a tree of JSONValue instances. The JSONValue instances are enums with associated data that give you specifically what was found, and aside from being verbose, they worked well for a few of my needs in grabbed data from JSON files.
Thank you. I missed that part, which is exactly what I was hoping for. Don’t care about JSON5.
Do you expect it to run faster than the native built-in JSONSerialization class for Apple platforms?
I will try it out and see for myself though.
Just curious: given that I’ve read in a JSONValue instance (which is, as you said, nominally a tree, and the exact analogue of my wanting it as [String : Any]), can I turn that back into JSON? Because if I can do that (with sorted keys), at speed, that would be awesome.
I did a few quick benchmarks on it myself while I was playing with/exploring Benchmark – Swift Package Index and it was definitely faster than NSJSONSerialization.
I'd imagine it would be very easy to turn a tree of JSONValue back into String form, but I haven't done it myself.
Well, I think JSONSerialization has improved (on Apple platforms):
JSONSerialization read: 0.06638205051422119 seconds
Extra JSON Parser: 0.08066296577453613 seconds
Results are repeatable. This is for an 8MB JSON file. I have to give the edge here to JSONSerialization because for one thing, ExtraJSON leaves numbers as Strings, while JSONSerialization has turned them into their actual values, which has to be done at some point.
But I would guess this is likely to be about 2x faster than what I'm getting on Linux, though I'd have to benchmark that to know.