Parsing Decimal values from JSON


(Evtim Papushev) #1

Hello :slight_smile:

I am trying to find a way to parse a number as Decimal without losing the number's precision.

It seems that the JSON decoder parses it as Double then converts it to Decimal which introduces errors in the parsing. That behavior is in fact incorrect.

Does anyone know if there is a way to obtain the raw data for this specific field so I can write the conversion code?

Thanks,
Evtim


(Rimantas Liubertas) #2

It seems that the JSON decoder parses it as Double then converts it to
Decimal which introduces errors in the parsing. That behavior is in fact
incorrect.

Why do you say that? JS in JSON stand for JavaScript, and Javascript has
now idea about neither Decimal nor Integer numbers.

Best regards,
Rimantas


(Itai Ferber) #3

Hi Evtim,

Just want to give some context for this.
This is due to the fact that `JSONEncoder` and `JSONDecoder` are currently based on `JSONSerialization`: when you go to decode some JSON data, the data is deserialized using `JSONSerialization`, and then decoded into your types by `JSONDecoder`. At the `JSONSerialization` level, however, there is no way to know whether a given numeric value is meant to be interpreted as a `Double` or as a `Decimal`.

There are subtle differences to decoding as either, so there is no behavior that could satisfy all use cases. `JSONSerialization` has to make a decision, so if the number could fit losslessly in a `Double`, it will prefer that to a `Decimal`. This allows guaranteed precise round-tripping of all `Double` values at the cost of different behavior when decoding a `Decimal`.

In practice, this might not really matter in the end based on how you use the number (e.g. the loss in precision can be so minute as to be insignificant) — what is your use case here? And can you give some numeric values for which this is problematic for you?

As others have mentioned, one way to guarantee decoding a numeric string in a specific way is to actually encode it and decode it as a `String`, then convert into a `Decimal` where you need it, e.g.

import Foundation

struct Foo : Codable {
     var number: Decimal

     public init(number: Decimal) {
         self.number = number
     }

     private enum CodingKeys : String, CodingKey {
         case number
     }

     public init(from decoder: Decoder) throws {
         let container = try decoder.container(keyedBy: CodingKeys.self)
         let stringValue = try container.decode(String.self, forKey: .number)
         guard let decimal = Decimal(string: stringValue) else {
             throw DecodingError.dataCorruptedError(forKey: .number, in: container, debugDescription: "Invalid numeric value.")
         }

         self.number = decimal
     }

     public func encode(to encoder: Encoder) throws {
         var container = encoder.container(keyedBy: CodingKeys.self)
         try container.encode(self.number.description, forKey: .number)
     }
}

let foo = Foo(number: Decimal(string: "2.71828182845904523536028747135266249775")!)
print(foo) // => Foo(number: 2.71828182845904523536028747135266249775)

let encoder = JSONEncoder()
let data = try encoder.encode(foo)
print(String(data: data, encoding: .utf8)!) // => {"number":"2.71828182845904523536028747135266249775"}

let decoder = JSONDecoder()
let decoded = try decoder.decode(Foo.self, from: data)
print(decoded) // => Foo(number: 2.71828182845904523536028747135266249775)

print(decoded.number == foo.number) // => true

— Itai

···

On 28 Oct 2017, at 11:23, Evtim Papushev via swift-users wrote:

Hello :slight_smile:

I am trying to find a way to parse a number as Decimal without losing the number's precision.

It seems that the JSON decoder parses it as Double then converts it to Decimal which introduces errors in the parsing. That behavior is in fact incorrect.

Does anyone know if there is a way to obtain the raw data for this specific field so I can write the conversion code?

Thanks,
Evtim

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users


(Kevin Lundberg) #4

Swift shouldn't be forced to adhere to the limitations of JavaScript.
Just because JS doesn't know about decimals doesn't mean swift can't do
better.

Evtim, I don't believe JSONDecoder provides the access you are looking
for. Have you filed a bug on bugs.swift.org?

···

On 10/30/2017 6:02 AM, Rimantas Liubertas via swift-users wrote:

    It seems that the JSON decoder parses it as Double then converts
    it to Decimal which introduces errors in the parsing. That
    behavior is in fact incorrect.

Why do you say that? JS in JSON stand for JavaScript, and Javascript
has now idea about neither Decimal nor Integer numbers.

Best regards,
Rimantas

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users


(Jon Shier) #5

The appropriate solution here would be for Swift to have its own native JSON parser that allows direct decoding into generic types without the intermediary of JSONSerialization. For whatever reason there seems to be resistance to this from the Swift team, but until we have that ability, these types of issues will keep coming up, and the performance overhead of JSONSerialization with JSONDecoder on top of it will continue to leave Swift without a very performant JSON solution.
  That said, I appreciate the support given Codable on this list.

Jon

···

On Oct 31, 2017, at 1:07 PM, Itai Ferber via swift-users <swift-users@swift.org> wrote:

Hi Evtim,

Just want to give some context for this.
This is due to the fact that JSONEncoder and JSONDecoder are currently based on JSONSerialization: when you go to decode some JSON data, the data is deserialized using JSONSerialization, and then decoded into your types by JSONDecoder. At the JSONSerialization level, however, there is no way to know whether a given numeric value is meant to be interpreted as a Double or as a Decimal.

There are subtle differences to decoding as either, so there is no behavior that could satisfy all use cases. JSONSerialization has to make a decision, so if the number could fit losslessly in a Double, it will prefer that to a Decimal. This allows guaranteed precise round-tripping of all Double values at the cost of different behavior when decoding a Decimal.

In practice, this might not really matter in the end based on how you use the number (e.g. the loss in precision can be so minute as to be insignificant) — what is your use case here? And can you give some numeric values for which this is problematic for you?

As others have mentioned, one way to guarantee decoding a numeric string in a specific way is to actually encode it and decode it as a String, then convert into a Decimal where you need it, e.g.

import Foundation

struct Foo : Codable {
    var number: Decimal

    public init(number: Decimal) {
        self.number = number
    }

    private enum CodingKeys : String, CodingKey {
        case number
    }

    public init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: CodingKeys.self)
        let stringValue = try container.decode(String.self, forKey: .number)
        guard let decimal = Decimal(string: stringValue) else {
            throw DecodingError.dataCorruptedError(forKey: .number, in: container, debugDescription: "Invalid numeric value.")
        }

        self.number = decimal
    }

    public func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: CodingKeys.self)
        try container.encode(self.number.description, forKey: .number)
    }
}

let foo = Foo(number: Decimal(string: "2.71828182845904523536028747135266249775")!)
print(foo) // => Foo(number: 2.71828182845904523536028747135266249775)

let encoder = JSONEncoder()
let data = try encoder.encode(foo)
print(String(data: data, encoding: .utf8)!) // => {"number":"2.71828182845904523536028747135266249775"}

let decoder = JSONDecoder()
let decoded = try decoder.decode(Foo.self, from: data)
print(decoded) // => Foo(number: 2.71828182845904523536028747135266249775)

print(decoded.number == foo.number) // => true
— Itai

On 28 Oct 2017, at 11:23, Evtim Papushev via swift-users wrote:

Hello :slight_smile:

I am trying to find a way to parse a number as Decimal without losing the number's precision.

It seems that the JSON decoder parses it as Double then converts it to Decimal which introduces errors in the parsing. That behavior is in fact incorrect.

Does anyone know if there is a way to obtain the raw data for this specific field so I can write the conversion code?

Thanks,
Evtim

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users
_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users


(Itai Ferber) #6

I can’t speak to any resistance you’ve seen from the Swift team (nor to any performance issues you’ve encountered with `JSONSerialization`), but just keep in mind that
a) `JSONSerialization` is maintained by the Foundation team, and
b) maintaining a separate JSON parsing implementation just for Swift is a great way to introduce new, separate, likely incompatible bugs

That being said, we’ve considered it, and continue to consider it — there is just a cost-benefit analysis that goes into prioritizing developer time.

···

On 31 Oct 2017, at 10:18, Jon Shier wrote:

The appropriate solution here would be for Swift to have its own native JSON parser that allows direct decoding into generic types without the intermediary of JSONSerialization. For whatever reason there seems to be resistance to this from the Swift team, but until we have that ability, these types of issues will keep coming up, and the performance overhead of JSONSerialization with JSONDecoder on top of it will continue to leave Swift without a very performant JSON solution.
  That said, I appreciate the support given Codable on this list.

Jon

On Oct 31, 2017, at 1:07 PM, Itai Ferber via swift-users >> <swift-users@swift.org> wrote:

Hi Evtim,

Just want to give some context for this.
This is due to the fact that JSONEncoder and JSONDecoder are currently based on JSONSerialization: when you go to decode some JSON data, the data is deserialized using JSONSerialization, and then decoded into your types by JSONDecoder. At the JSONSerialization level, however, there is no way to know whether a given numeric value is meant to be interpreted as a Double or as a Decimal.

There are subtle differences to decoding as either, so there is no behavior that could satisfy all use cases. JSONSerialization has to make a decision, so if the number could fit losslessly in a Double, it will prefer that to a Decimal. This allows guaranteed precise round-tripping of all Double values at the cost of different behavior when decoding a Decimal.

In practice, this might not really matter in the end based on how you use the number (e.g. the loss in precision can be so minute as to be insignificant) — what is your use case here? And can you give some numeric values for which this is problematic for you?

As others have mentioned, one way to guarantee decoding a numeric string in a specific way is to actually encode it and decode it as a String, then convert into a Decimal where you need it, e.g.

import Foundation

struct Foo : Codable {
    var number: Decimal

    public init(number: Decimal) {
        self.number = number
    }

    private enum CodingKeys : String, CodingKey {
        case number
    }

    public init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: CodingKeys.self)
        let stringValue = try container.decode(String.self, forKey: .number)
        guard let decimal = Decimal(string: stringValue) else {
            throw DecodingError.dataCorruptedError(forKey: .number, in: container, debugDescription: "Invalid numeric value.")
        }

        self.number = decimal
    }

    public func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: CodingKeys.self)
        try container.encode(self.number.description, forKey: .number)
    }
}

let foo = Foo(number: Decimal(string: "2.71828182845904523536028747135266249775")!)
print(foo) // => Foo(number: 2.71828182845904523536028747135266249775)

let encoder = JSONEncoder()
let data = try encoder.encode(foo)
print(String(data: data, encoding: .utf8)!) // => {"number":"2.71828182845904523536028747135266249775"}

let decoder = JSONDecoder()
let decoded = try decoder.decode(Foo.self, from: data)
print(decoded) // => Foo(number: 2.71828182845904523536028747135266249775)

print(decoded.number == foo.number) // => true
— Itai

On 28 Oct 2017, at 11:23, Evtim Papushev via swift-users wrote:

Hello :slight_smile:

I am trying to find a way to parse a number as Decimal without losing the number's precision.

It seems that the JSON decoder parses it as Double then converts it to Decimal which introduces errors in the parsing. That behavior is in fact incorrect.

Does anyone know if there is a way to obtain the raw data for this specific field so I can write the conversion code?

Thanks,
Evtim

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users
_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users


(Rimantas Liubertas) #7

Swift shouldn't be forced to adhere to the limitations of JavaScript. Just
because JS doesn't know about decimals doesn't mean swift can't do better.

JSON does not know about decimals either. http://json.org/. If you need
some custom data type you can always pass it as a string and then handle as
needed.

Best regards,
Rimantas


(Simon Kågedal Reimer) #8

I think that the real problem here is that the Decimal initializer that takes a Double does not do a very good job of converting the Double to a Decimal. This does not only affect JSON decoding, but also simple statements such as:

let myDecimal: Decimal = 3.133

I would expect that to give me the exact Decimal of 3.133, but since this will pass through a Double (through the ExpressibleByFloatLiteral protocol), I will get an inexact value of 3.132999999999999488.

The problem of converting a binary floating point number to a decimal floating point representation might not be easy, but it seems to me that there are better ways of doing it, and that such an algorithm can be found when we convert Double to String.

Consider these two functions that create a Decimal from a Double.

func decimalThroughString(from double: Double) -> Decimal {
    return Decimal(string: "\(double)")!
}

func decimalDirectly(from double: Double) -> Decimal {
    return Decimal(double)
}

For all numbers I’ve been able to find – I did a little round trip test with random decimal numbers – the decimalThroughString method works perfectly while the decimalDirectly often gives unwanted results. I’m wondering if there’s any situation I’m not seeing where the behavior of the Decimal(_: Double) initializer is preferrable.


Here’s my random test:

func randomDecimal() -> Decimal {
    let double = Double(arc4random()) / Double(UInt32.max) * 100
    let formatter = NumberFormatter()
    formatter.decimalSeparator = "."
    formatter.maximumFractionDigits = Int(arc4random() % 20)
    let string = formatter.string(from: double as NSNumber)!
    return Decimal(string: string)!
}

var totalCorrectDirectly = 0, totalCorrectThroughString = 0
for _ in 1...1000 {
    let decimal = randomDecimal()
    let double = (decimal as NSNumber).doubleValue
    let correctDirectly = decimalDirectly(from: double) == decimal
    let correctThroughString = decimalThroughString(from: double) == decimal
    totalCorrectDirectly += correctDirectly ? 1 : 0
    totalCorrectThroughString += correctThroughString ? 1 : 0
}
totalCorrectDirectly // ~378
totalCorrectThroughString // 1000

(I also wrote a blog post about this problem.)


(Evtim Papushev) #9

Hi,

@Rimantas_Liubertas, @itaiferber , thanks for the suggestions, but that won’t do as I have to process data from an external service that I do not have control over.

I found that parsing values as Doubles leads to unwanted representation errors in the parsed data. That is why I started this thread, sorry for not being more explicit from the beginning.
Unfortunately, I won’t be able to provide exact numbers as I left the project a few months ago. But those rounding errors happened quite often; it was related to cryptocurrencies’ prices and volumes. Due to their price (especially Bitcoin), transactions quite often contain volumes of 0.001. By the protocol precision is needed to at least 8 digits after the decimal point. While I could ignore rounding in prices as I would take min/max, volumes got aggregated and I saw noticeable deviations.

Also, as someone already noted (strangely, I don’t see their response in the thread), JSON does not specify IEEE precision. It only specifies the automation that should parse values. It is up to the parser to handle types, precisions and overflows correctly. And since JSON numbers are base 10, not base 2, parsing to Decimal seems more natural. Going through Double will imminently introduce errors not expected by the user.

Also, as skagedal noted, one of the constructors of Decimal has incorrect implementation. Here’s the code from the Swift repo:

extension Decimal : ExpressibleByFloatLiteral {
    public init(floatLiteral value: Double) {
        self.init(value)
    }
} 

I think this should be corrected at the language level as one would expect that

let myDecimal: Decimal = 3.133

initializes the Decimal with this exact value, not with a rounded number of 3.132999999999999488.

Thanks,
Evtim


(Xiaodi Wu) #10

With regard to your initial question about JSON. You’re correct that, when you are working with currency, you really must use decimal types. However, the JSON standard does not guarantee any particular way of parsing numbers or any precision. Moreover, there is no guarantee that the value you see in a JSON file wasn’t serialized from a Double to begin with.

So if you have no control over the data from the external service, this is not a solvable problem here because it’s not a problem with how Swift parses JSON. It is entirely acceptable to the JSON standard to conclude that, in general, any floating-point value in a JSON file is more likely to be a serialized Double than a serialized decimal value. If you did have control over the data, serializing as a string is the most reliable way to ensure the right precision when you parse JSON.


As to @skagedal’s point, I’ll refer you to an earlier thread about conversion from Double to Decimal:

As I wrote there, the behavior you see is understandably surprising but actually what the semantics of ExpressibleByFloatLiteral demand. Technically, 3.133 is a literal that represents a binary floating-point value, rounded to some precision. There is no literal in Swift for a decimal value. To get there from here is nontrivial, but it’s not the primary issue with JSON that was the original question.


(Daniel Duan) #11

There’s desire to solve this problem from the Swift team: https://bugs.swift.org/browse/SR-3317. I suggest whoever is passionate about it start working from there.


(Evtim Papushev) #12

@xwu, actually, the JSON standard does not define what the underlying data types should be. What it guarantees is that both the number and the exponent are base 10, not base 2.

The best implementation would be one that goes beyond assumptions such as “double would be the best option here” and goes with what the developer expects. The Codable protocol provides a way for developers to do exactly that - set expectations on the way JSON (and any other serializated data) should be decoded.

Not having guarantees on how a service is implemented does not mean I can make the assumption it is done the wrong way.
The service could be using long double / Float80. Needless to say base-10 string to Float80 conversion introduces smaller errors than the same string converted to Double / Float64. But using Float80 on my side hits absolutely the same problem because the parser goes through Double :)

Anyways, I already have the whole picture (the current one). Just wanted to give some feedback.

Best regards,
Evtim


(Xiaodi Wu) #13

We’re not in disagreement here. JSON only requires that the value be serialized in base 10. This means you’ll have implementation-defined differences in how numbers are serialized to and deserialized from JSON.

My understanding of the reasoning is that, in fact, most developers will expect Double (aka Float64) to roundtrip flawlessly, as it’s the numeric type in JavaScript and the default floating-point type in Swift.

Right, illustrating that what JSONEncoder/Decoder does when reading JSON is quite independent of how Swift deals with float literals, which support Float80 with no loss of precision.


(Simon Kågedal Reimer) #14

Thank you for the reply, @xwu.

I think the situations with literals and JSON decoding are quite similar. In both situations, I have a rational number written in base 10 and I am explicitly asking to get it as a Decimal, using ways to express that provided to me by the tool.

It would have been much better, compared to the current situation, if Decimal didn’t conform to ExpressibleByFloatLiteral, as it really doesn’t. You seem to be (sort of?) agreeing with this in the other thread. Similarly, it would have been better if JSONDecoder didn’t provide direct support for decoding Decimal, since I think it is in the same way fooling the user that it is doing something that it isn’t.

The best solution in both cases would, imho, be that there was actual direct support for decimal floating point literals – i.e., an ExpressibleByDecimalFloatingPoint protocol and a JSON parser that sent the parsed token directly to the Decimal-from-string initializer. I understand of course that this entails some non-trivial work (thank you for the bug link, @duan!) and that it is not of highest priority.

What I’m wondering about – and let’s just discuss the JSONDecoder case here, since that is what the thread is about – is what to me seems to be a simple improvement over the current situation, even if still not ideal: to change the conversion to use something equivalent to Decimal(string: "\(double)"). You touch on that in the other thread, mentioning the problem that you “lose all information as to significant digits in the original literal (i.e., “0.1” vs. “0.100”)” – which is true, but is surely also the case with Decimal(double).

I’d be curious to see a piece of JSON that, when the user has specified decoding as Decimal, would give a more expected result using the current implementation than using that kind of conversion.

Regards, Simon


(Xiaodi Wu) #15

That seems reasonable and justifiable to me. Since, in order to write the JSON, a value of some type T had to be converted to a string, it seems reasonable to read the JSON by converting the string back to the value of type T by using String as the transport type rather than Double, even if it’s a “number.”

[Edit: It seems that what Itai is saying, though, is that the current design of the JSON parsing facilities has no way of supporting this.]


(Tellow Krinkle) #16

Since the conversion from a Base-10 string (stored in the JSON) to a Decimal (which also stores as base-10) should be lossless for any <38 digit number, why would internally storing values as Decimals and converting to Doubles on demand be any less precise than converting to Doubles immediately from the string? When would the conversion Base 10 String → Decimal → Double result in a number different from Base 10 String → Double?


(Chris Anderson) #17

I hesitated to bring up an old thread, but I have been wrestling with this bug myself and it seems like still an open question. I wanted to know if any progress had been made, either/or on the Swift front, or in better approaches to deal with this bug from the user perspective.

I've seen https://bugs.swift.org/browse/SR-7054 and others, so it feels like it is an issue under discussion and consideration, but without a solution as of it.

In the meantime, is Parsing Decimal values from JSON still the best way to deal with it from the user perspective? While that works, it is obviously less than ideal as it forces you to override init and thus manually decode every value in the class/struct.