Json

not sure if this fits into a “pitch” or a “proposal”, or something in-between, but here goes nothing:

motivation

JSON is one of the most common formats to exchange structured data as over a network, but right now we’re limited to using the JSONDecoder vended by Foundation, either directly via a Foundation import, or indirectly through Vapor. JSONDecoder also struggles with decimal values, as it parses them all as floating point, which can lead to data corruption.

proposed library

json is a git submodule that implements JSON parsing in pure Swift. It’s also integrated with Decodable!

sample program:

@main enum Main 
{
    struct Decimal:Codable  
    {
        let units:Int 
        let places:Int 
    }
    struct Number:Codable 
    {
        let success:Bool 
        let value:Decimal
    }
    static 
    func main() throws
    {
        let string:String = 
        """
        {"success":true,"value":0.1}
        """
        let decoder:JSON.Decoder = 
            try Grammar.parse(string, as: JSON.Decoder.self)
        let number:Number = try .init(from: decoder)
    }
}

json depends on grammar, so you need to have both submodules added for the example to compile.

interested in knowing what you all think, and where to go from here!

6 Likes

Consider adding JSONSerialization equivalent to your library, or do you think it shall be a separate library?

Also consider addressing JSON coder/serializer limitations (e.g. these but there are probably others).

2 Likes

there’s no Encoder in the library because the JSON i was sending was much simpler than the JSON i was receiving, so i could get away with using string interpolations.

for what it’s worth, the JSON.Value type is CustomStringConvertible, so if you construct the JSON instance, you can print it back to a string. so what is needed is the Encoder implementation, the serializer already exists.

i read the other thread, and i’m not sure what you’re proposing exactly. the json submodule decodes nil if the key is not present:

struct Number:Codable 
{
    let success:Bool 
    let value:Decimal?
}
static 
func main() throws
{
    let string:String = 
    """
    {"success":true}
    """
    let decoder:JSON.Decoder = 
        try Grammar.parse(string, as: JSON.Decoder.self)
    let number:Number = try .init(from: decoder)
    print(number)
}
// Number(success: true, value: nil)

it only does this for KeyedDecodingContainerProtocol. it will not generate infinite array elements for UnkeyedDecodingContainer, and it expects an explicit null if using SingleValueDecodingContainer. i admit i don’t really understand the subtleties of this.

Got you. If you want this library to be a replacement for JSON support found in Foundation it needs to have both encoding and decoding (and I would advocate for having both JSONSerialiation encode / decode as well in a general purpose JSON library).

The examples there are self explanatory (or so I thought):


Decoder: dontOverrideFieldsThatHaveDefaultValues:

struct Foo {
    var x: Int
    var y: Int?
    var z: Int! = 123
    var w: Int? = 456
}

JSON: { "x": 1 }

Result with dontOverrideFieldsThatHaveDefaultValues strategy OFF:

{ x: 1, y: nil, z: nil, w: nil }

Result with dontOverrideFieldsThatHaveDefaultValues strategy ON:

{ x: 1, y: nil, z: Optional(123), w: Optional(456) }

Decoder: requreFieldsThatDontHaveDefaultValues:

struct Foo {
    var x: Int?
    var y: Int? = nil
    var z: Int? = 123
}

JSON: {}

Result with requreFieldsThatDontHaveDefaultValues strategy OFF:

{ x: nil, y: nil, z: nil }   // dontOverrideFieldsThatHaveDefaultValues = off

Result with requreFieldsThatDontHaveDefaultValues strategy OFF:

{ x: nil, y: nil, z: Optional(123) }   // dontOverrideFieldsThatHaveDefaultValues = on

Result with requreFieldsThatDontHaveDefaultValues strategy ON:

// runtime error, x & y fields are missing

The above two might not be possible with JSONDecoder API as we currently have it. OTOH, if you are creating a new library you are more flexible, e.g. instead of having:

JSONDecoder().decode(T.self, from: data)

you may have:

MyJSONDecoder().decode(T(), from: data)

i.e. instead of passing a type you may pass an instance variable with is already filled out with default values, that are later:
- overridden by values from json if there is "field: value" in JSON (even if value in JSON is nil)
- not overridden if there is no corresponding "field: value" in JSON


Encoder: emitExplicitNil:

Example:

struct Foo {
    var x: Int
    var y: Int?
}

Value { x: 1, y: nil }

Result with emitExplicitNil strategy OFF: "{ "x": 1 }
Result with emitExplicitNil strategy ON: "{ "x": 1, "y": nil }

Edit:

Alternatively you may still have API of the form:

JSONDecoder().decode(T.self, from: data)

just have an additional requirement on T that it is "Inittable" (has init()) - that's how you can construct an instance inside and that instance will be filled with default values

1 Like

i agree! Encodable support is a must before downstream users can drop the Foundation dependency.

at the same time, i feel that we as a community have come around to the idea that “lots of small libraries are better than a few giant libraries”, the json module, as implemented right now, contains:

  • JSON parser
  • JSON serializer
  • JSON decoder

is there value in distributing the parser together with the serializer, separately from the Encoder/Decoder layer? (this would really be a small library, as the parser and serializer only take up ~600 loc)

okay i think i understand what you mean. the problem is decoders don’t have access to a lot of information about the type being decoded. they provide API that compiler-synthesized code calls into to do the actual instance construction. i don’t know if that can be implemented at the library level.

All of these components seems small. Besides when statically linked a bigger library size is not an issue at all (unused parts are stripped off).

I think this is doable, a sketch approach outlined here.

we’ve had MemoryLayout.offset(of:) for a while now, so that’s not the issue. we don’t have a way of getting all the stored property keypaths of a type though. and we would be bypassing any setters or observers on the properties, which is bad. or maybe this can be expected behavior. i don’t think i’ve ever used an observer on a Codable type, usually i just convert them to the model type as soon as possible…

I wrestled with this choice when working on swift-http-structured-headers. Ultimately I ended up providing both layers in the same package, but distinctly separating them at the module layer. You could pay only for the non-Codable bits and have to write some glue code yourself, or you could pay for Codable and you'd get the extra code size and performance costs associated with that.

I think that layering is really useful, and it may be a useful solution for this library as well.

1 Like

I'm really interested in seeing alternatives to JSONSerialization and JSONEncoder/JSONDecoder popping up. I agree with the general principal that the decoder is more complex than the encoder as well.

For my part, I think I'd like to see this pivoting into a Swift package and getting a test suite. Right now the code is not standalone so it's very hard, as an outside observer, to feel confident about how well this code works: I need to go look at grammar first. It also puts a lot of requirements on me, the adopter, to pull it down and play with it: I can't simply grab it with the package manager and start tooling around.

That goes double because (and I hope you agree) the coding style in this project is idiosyncratic. This makes it substantially harder for me to validate the correctness of the code via code read. This makes a test suite much more valuable to help build my confidence in the code. It also makes it much easier for me to fix my own bugs if I were to adopt the code, because I can be guided through the code by the unit test suite.

These are just some early thoughts, but I'm glad you published the project, and I look forward to seeing how it grows!

12 Likes

Hey, I want to quickly mention that I have already put a fair amount of effort into IkigaJSON which seems to align with your goals. I'm happy to propose it. I'll get back to this topic later to add my two cents on what I think a JSON library should be like.

5 Likes

Does any one have experience with GitHub - simdjson/simdjson: Parsing gigabytes of JSON per second ? There are bindings for swift and I wonder if a port could be made and still keep the performances.

2 Likes

You can definitely keep the performance (for a large part) if you don't use Codable.

1 Like

So IkigaJSON was designed in a raw that fits my vision/purpose, though I feel it's documentation and public APIs could use some work.

First of all, I think a JSONDecoder should strive to focus on Codable, and not parsing as well. The parser is a separate goal, and from the perspective of a Decoder, an implementation detail. A decoder should strive to support the APIs presented by JSONDecoder (the decoding strategies) as a minimum.

Secondly, I think the encoding process should focus on just the encoding, again. If that requires a JSONSerializer, like a JSONDecoder might require a JSONParser, so be it. But that's an implementation detail. Please note that IkigaJSON currently does not implement a JSONSerializer as a backing feature of JSONEncoder. The JSONEncoder directly encodes the data to a JSON String/ByteBuffer.

Thirdly, I think that a parser should provide insight into a JSON buffer, but not copy the data out of there into a type like Dictionary or even JSONObject. IkigaJSON achieves this through a "JSONDescription", a buffer that contains a list of all tokens inside the original JSON buffer. This achieves pretty good performance now, but it also leads the JSON library to support the following points.

Fourth, I think that JSONObject or JSONArray, if you want JSONValues in Swift at all, should not copy all of the values. Like BSON (by both myself, and the official MongoDB BSON library), it can be more efficient to copy just the data you need. So we don't copy all of of the keys and values before you might or might not need those. A nested JSONObject can (and probably should) be a slice of its parent.

Fifth, I think that there are a lot of uses for a token map like Ikiga's that I don't support yet, that would greatly benefit performant lookups or mutations in JSON.

Finally, Ikiga now emits errors with line/column info for editors, and it's extremely cheap. I'd personally love this to be a thing in any library resulting or being adopted from this discussion. JSON is such a common format, that I believe that emitting line/column info is a very nice feature. Ikiga could probably be optimised so that line/column info doesn't impact parsing performance (until an error occurs). But I went for the easier route - for now.

The one thing I've not made up my mind on yet is whether or not a JSON library like this should lean on NIO for ByteBuffer. I currently do depend on NIO for ByteBuffer.

7 Likes

Just for completeness, there is also swift-extras-json, that exposes an explicit JSONParser. It has comparable performance to @Joannis_Orlandos' IkigaJSON and was the basis for the swift-corelibs-foundation improvements that landed in Swift 5.5. Most notably swift-extras-json depends on the swift stdlib only.

4 Likes

right, i figured people would ask why it’s a submodule and not a proper module.

the reason is that the parser is generic over the type being parsed (this is how grammar is designed). so it doesn’t care what the backing storage is as long as it provides some Collection interface of Element type Character.

grammar also isn’t tied to Character, it is perfectly capable of (and powerful at) parsing binary formats like ByteBuffer, [UInt8], ArraySlice<UInt8> etc. it can even parse the output of other, lower-level grammar parsers. json is defined at the grapheme cluster level, though.

anyway that’s just some background. as is well known, vending generics across module boundaries is very bad for performance. so if json is a submodule, the compiler can specialize for String, Substring, etc and you get great performance. but if it’s a module, it has to parse a generic Collection, which is slow.

@inlinable does not solve this problem because it would only allow the entry point to be specialized. all the child parsers would still be abstracted. @_specialized is not official API, and at any rate it’s hard to anticipate all the possible string-like wrappers people could be using.

i can’t really think of a good solution for this other than making both submodules usableFromInline in their entirety, which would be silly. interested in knowing if you have any suggestions!

2 Likes

how are you getting a string out of a ByteBuffer in the first place? are you doing your own grapheme cluster breaking?

also, what have you observed with respect to the performance impact of indirection on small JSON objects? for example, both strings in

{"coin":"BTC"}

are small, and could be stored inline with no heap allocation.

i love this! grammar is also able to emit “stack traces” and nice parsing diagnostics. i had to un-expose this though, only because it depended on terminal colors. (not a blocker, just an explanation.)

it avoids impacting parsing performance by only reconstructing the error chain through try catch propogation. but that makes the submodule heavily reliant on high-level SIL optimizations, since we don’t care about the trace when parsing an optional try?

how are you getting a string out of a ByteBuffer in the first place? are you doing your own grapheme cluster breaking?

I don't do any of that myself. As it stands, I 'scan' for the boundaries of this String and let Swift parse that buffer once the String itself is required. This approach also saves me from a lot of upfront performance cost if the data isn't actually used.

also, what have you observed with respect to the performance impact of indirection on small JSON objects? for example, both strings in

This is exactly where Foundation JSONDecoder on Linux is currently faster than IkigaJSON. I have no solution for this at the moment.

1 Like

got it, you’re parsing the JSON at the binary level and then upgrading to UTF-8 inside a string literal context, right? that’s an interesting direction, i don’t remember if the JSON spec requires a particular string encoding

maybe a bit-flag could solve this? store 7-byte strings inline and use the flag to indicate an index range?

I think types like ByteBuffer should be broken into distinct packages as much as possible. It’s ridiculous to depend on massive modules like NIOCore just for that, and weak cohesion can easily lead to dependency hell.

1 Like