Json

All of these components seems small. Besides when statically linked a bigger library size is not an issue at all (unused parts are stripped off).

I think this is doable, a sketch approach outlined here.

we’ve had MemoryLayout.offset(of:) for a while now, so that’s not the issue. we don’t have a way of getting all the stored property keypaths of a type though. and we would be bypassing any setters or observers on the properties, which is bad. or maybe this can be expected behavior. i don’t think i’ve ever used an observer on a Codable type, usually i just convert them to the model type as soon as possible…

I wrestled with this choice when working on swift-http-structured-headers. Ultimately I ended up providing both layers in the same package, but distinctly separating them at the module layer. You could pay only for the non-Codable bits and have to write some glue code yourself, or you could pay for Codable and you'd get the extra code size and performance costs associated with that.

I think that layering is really useful, and it may be a useful solution for this library as well.

1 Like

I'm really interested in seeing alternatives to JSONSerialization and JSONEncoder/JSONDecoder popping up. I agree with the general principal that the decoder is more complex than the encoder as well.

For my part, I think I'd like to see this pivoting into a Swift package and getting a test suite. Right now the code is not standalone so it's very hard, as an outside observer, to feel confident about how well this code works: I need to go look at grammar first. It also puts a lot of requirements on me, the adopter, to pull it down and play with it: I can't simply grab it with the package manager and start tooling around.

That goes double because (and I hope you agree) the coding style in this project is idiosyncratic. This makes it substantially harder for me to validate the correctness of the code via code read. This makes a test suite much more valuable to help build my confidence in the code. It also makes it much easier for me to fix my own bugs if I were to adopt the code, because I can be guided through the code by the unit test suite.

These are just some early thoughts, but I'm glad you published the project, and I look forward to seeing how it grows!

12 Likes

Hey, I want to quickly mention that I have already put a fair amount of effort into IkigaJSON which seems to align with your goals. I'm happy to propose it. I'll get back to this topic later to add my two cents on what I think a JSON library should be like.

5 Likes

Does any one have experience with GitHub - simdjson/simdjson: Parsing gigabytes of JSON per second ? There are bindings for swift and I wonder if a port could be made and still keep the performances.

2 Likes

You can definitely keep the performance (for a large part) if you don't use Codable.

1 Like

So IkigaJSON was designed in a raw that fits my vision/purpose, though I feel it's documentation and public APIs could use some work.

First of all, I think a JSONDecoder should strive to focus on Codable, and not parsing as well. The parser is a separate goal, and from the perspective of a Decoder, an implementation detail. A decoder should strive to support the APIs presented by JSONDecoder (the decoding strategies) as a minimum.

Secondly, I think the encoding process should focus on just the encoding, again. If that requires a JSONSerializer, like a JSONDecoder might require a JSONParser, so be it. But that's an implementation detail. Please note that IkigaJSON currently does not implement a JSONSerializer as a backing feature of JSONEncoder. The JSONEncoder directly encodes the data to a JSON String/ByteBuffer.

Thirdly, I think that a parser should provide insight into a JSON buffer, but not copy the data out of there into a type like Dictionary or even JSONObject. IkigaJSON achieves this through a "JSONDescription", a buffer that contains a list of all tokens inside the original JSON buffer. This achieves pretty good performance now, but it also leads the JSON library to support the following points.

Fourth, I think that JSONObject or JSONArray, if you want JSONValues in Swift at all, should not copy all of the values. Like BSON (by both myself, and the official MongoDB BSON library), it can be more efficient to copy just the data you need. So we don't copy all of of the keys and values before you might or might not need those. A nested JSONObject can (and probably should) be a slice of its parent.

Fifth, I think that there are a lot of uses for a token map like Ikiga's that I don't support yet, that would greatly benefit performant lookups or mutations in JSON.

Finally, Ikiga now emits errors with line/column info for editors, and it's extremely cheap. I'd personally love this to be a thing in any library resulting or being adopted from this discussion. JSON is such a common format, that I believe that emitting line/column info is a very nice feature. Ikiga could probably be optimised so that line/column info doesn't impact parsing performance (until an error occurs). But I went for the easier route - for now.

The one thing I've not made up my mind on yet is whether or not a JSON library like this should lean on NIO for ByteBuffer. I currently do depend on NIO for ByteBuffer.

7 Likes

Just for completeness, there is also swift-extras-json, that exposes an explicit JSONParser. It has comparable performance to @Joannis_Orlandos' IkigaJSON and was the basis for the swift-corelibs-foundation improvements that landed in Swift 5.5. Most notably swift-extras-json depends on the swift stdlib only.

4 Likes

right, i figured people would ask why it’s a submodule and not a proper module.

the reason is that the parser is generic over the type being parsed (this is how grammar is designed). so it doesn’t care what the backing storage is as long as it provides some Collection interface of Element type Character.

grammar also isn’t tied to Character, it is perfectly capable of (and powerful at) parsing binary formats like ByteBuffer, [UInt8], ArraySlice<UInt8> etc. it can even parse the output of other, lower-level grammar parsers. json is defined at the grapheme cluster level, though.

anyway that’s just some background. as is well known, vending generics across module boundaries is very bad for performance. so if json is a submodule, the compiler can specialize for String, Substring, etc and you get great performance. but if it’s a module, it has to parse a generic Collection, which is slow.

@inlinable does not solve this problem because it would only allow the entry point to be specialized. all the child parsers would still be abstracted. @_specialized is not official API, and at any rate it’s hard to anticipate all the possible string-like wrappers people could be using.

i can’t really think of a good solution for this other than making both submodules usableFromInline in their entirety, which would be silly. interested in knowing if you have any suggestions!

2 Likes

how are you getting a string out of a ByteBuffer in the first place? are you doing your own grapheme cluster breaking?

also, what have you observed with respect to the performance impact of indirection on small JSON objects? for example, both strings in

{"coin":"BTC"}

are small, and could be stored inline with no heap allocation.

i love this! grammar is also able to emit “stack traces” and nice parsing diagnostics. i had to un-expose this though, only because it depended on terminal colors. (not a blocker, just an explanation.)

it avoids impacting parsing performance by only reconstructing the error chain through try catch propogation. but that makes the submodule heavily reliant on high-level SIL optimizations, since we don’t care about the trace when parsing an optional try?

how are you getting a string out of a ByteBuffer in the first place? are you doing your own grapheme cluster breaking?

I don't do any of that myself. As it stands, I 'scan' for the boundaries of this String and let Swift parse that buffer once the String itself is required. This approach also saves me from a lot of upfront performance cost if the data isn't actually used.

also, what have you observed with respect to the performance impact of indirection on small JSON objects? for example, both strings in

This is exactly where Foundation JSONDecoder on Linux is currently faster than IkigaJSON. I have no solution for this at the moment.

1 Like

got it, you’re parsing the JSON at the binary level and then upgrading to UTF-8 inside a string literal context, right? that’s an interesting direction, i don’t remember if the JSON spec requires a particular string encoding

maybe a bit-flag could solve this? store 7-byte strings inline and use the flag to indicate an index range?

I think types like ByteBuffer should be broken into distinct packages as much as possible. It’s ridiculous to depend on massive modules like NIOCore just for that, and weak cohesion can easily lead to dependency hell.

1 Like

JSON RFC has this to say:

2 Likes

Just one thing: I find the MPL slightly off-putting.

MPL is a file-level copyleft license, and while it can be combined with other licenses as part of a larger work, my understanding is that any modifications to MPL-licensed files must also be made available to the public.

So if you fix a bug or add some custom behaviour to better integrate with your application or private libraries, you may be compelled to make those public. Extensions in other source files are apparently not considered "modifications", but the extent of the changes you can make without touching the original MPL-ed file(s) may be limited.

See Q9 and Q10 of the MPL FAQ.

It's not necessarily a deal-breaker if the library was incredibly unique and worth the hassle, but if I had a choice between an MIT/Apache-2.0 library and MPL-2.0 library of approximately equal quality, I'd pick the former. The MPL library would need to do something truly exceptional to make me consider it.

As an example, if async-http-client was under MPL2, and I used my WebURL fork in an AppStore app, I would have to make that fork public, even if I didn't want to, because I can't make changes of that level without modifying AHC's original MPL2 files. Personally, I like the freedom to make any changes I think my applications could benefit from, without the obligation to share that with the world, and I consider it important that others have that same level of freedom when using any code that I've written. My philosophy is that sharing should be encouraged, but it should also be a choice.

I remember that we had this discussion before about SwiftJPEG, which was originally licensed as GPLv3. MPL is better (looser) than GPLv3, but it is still copyleft. It's up to you whether you want to actually compel modifications to be open-sourced as a part of the license, but it is worth pointing out.

2 Likes

i don’t understand the details of the various licenses very well, so i just went with MPL because it sounds like a “middle-of-the-road” license between the GPL and Apache extremes. i understand swift itself is Apache, but Apple also has tons of lawyers to help them make that decision, and i don’t.

that said, i agree forcing everyone to make their forks public is unreasonable. i don’t know how anyone can enforce that anyway, if a fork is private. i don’t expect people to publish every modification, if they were maintaining a fork instead of submitting it as a PR (way less effort) it was probably because it was very specific to their application and not something that would make sense in isolation.

1 Like

update: swift-json is now a normal SPM module, and you can test it out here!

1 Like

You assume, incorrectly, that they are extremes. Apple chose Apache for the same reason many do: it provides a useful array of legal rights. You could always read the license yourself, or use GitHub’s excellent help for choosing a license. Note that the latter advises choosing the same license the ecosystem normally uses: in Swift’s case, Apache.

Apache basically allows complete freedom of use, with the exception that you can’t mischaracterize the original work as your own or mischaracterize your own work as the original.