SE-0489: Improve EncodingError and DecodingError's printed descriptions

It doesn't look like CustomDebugStringConvertible has anything in its documentation about this. If folks really believe that a CustomDebugStringConvertible that has multiline output is something that should be forbidden or strongly discouraged, then separately from this proposal, can we please state that explicitly in the documentation for that protocol? Is there a reason we can't or shouldn't do that?

2 Likes

I agree that the documentation for both protocols should be made more specific about the expectations and how they are used. I was surprised to learn that CustomStringConvertible implementations are sometimes calling CustomDebugStringConvertible .

1 Like

I'm super happy that the this UX is being addressed, but I don't think the proposed error messages quite hit the mark.

  1. It uses Swift-specific jargon that would be confusing to new users
  2. Introduces a novel format for describing paths, instead of embracing existing standards
  3. Wastes space printing both the stringValue and intValue, when the overwhelming majority cases would only have one or the other

On terminology

"Keyed coding container" vs "unkeyed coding container" are overly generic terms to expose on to Codable users. They make sense at API level that's trying to be format-agnostic, to an audience of library authors implementing Encoders. However, they're confusing terms to users of Codable libraries. Think of a new dev just making their first web request to sling some JSON around. They would be familiar with what an "array" or "object" (perhaps "dictionary") is, but "unkeyed coding container" is niche Swift jargon.

As a point of comparison, YAML parsers also give confusing messages, like this example:

did not find expected ',' or '}' while parsing a flow sequence at line 1 column 4

What the heck is a "flow sequence"? Apparently that's their term for an array, and "mapping" if their term for a dictionary.

I propose that we add some extension points for Decoders to customize error messages, to give richer format-specific message. For example, the JSONDecoder could define something like:

struct JSONDecoder: Encoder { 
    static let keyedContainerName = "object"
    static let unkeyedContainerName = "array"
}

Coding path description

There's no need for Swift to introduce its own format for describing coding paths. This would only be a 15th competing standard, which wouldn't be compatible with the whole world of pre-existing tooling for dealing with serialized data.

Instead, we should ask the Decoder to format the coding path for us, allowing it to use the established format for that kind of data. For example:

struct JSONDecoder: Encoder { 
    func describeCodingPath(codingPath: [any CodingKey]) -> String {
        // Generate a `jq` query, like `.[0].home.country`
    }
}

Other examples:

  • YAMLDecoder might describe coding paths in the yq query format
  • XMLDecoder might produce XPath
  • ProtobufDecoder might produce strings in the "field path" format

Users seeing these messages can just copy the path, plop it right into jq, and start examining their data from there.

Coding key formatting

The CodingKey protocol technically models coding keys as a sum type (similar to a struct), but it's effectively a union type (similar an enum). It guarantees two initializes, one which only sets the string value, and another which only sets the int value. It guarantees a getter for both, but no setters. Going through the CodingKey protocol alone, it's impossible to construct a coding key like CodingKey(stringValue: "a", intValue: 1).

Concrete conformers to CodingKey can add API for setting both, (e.g. an init(stringValue: String, intValue: Int), or setters for the properties), but this is highly unusual.

Thus, there's no point printing both values, if one of the two is almost surely nil. In the unlikely situation both are non-nil, sure print them both, but otherwise we can condense it down:

- CodingKeys(stringValue: "population", intValue: nil)
+ "population"
- CodingKeys(stringValue: nil, intValue: 3)
+ 3

Proposed message example

Here's an example message format I propose, incorporating the three ideas above:

- Key 'population' not found in keyed decoding container.
+ Key 'population' not found in object
- Debug description: No value associated with key CodingKeys(stringValue: "population", intValue: nil) ("population").
+ Debug description: No value associated with key "population".
- Path: [0]/home/country
+ Path: .[0].home.country
5 Likes

@lorentey goes into great detail about the true purpose of these protocols here:

We should update the documentation with a version of that.

4 Likes

The steering group previously considered the matter as part of that prior review. As I reported out in the decision notes, the conclusion of the group was:

3 Likes

+1

Several thoughtful comments have been posted here about code that this change might break. But I'm OK with breakage in this case.

My general sense is that no object, protocol, or method with Debug in the name should be considered part of the Swift ABI contract. These entities are metadata and metabehaviors designed for the use of developers during development. If someone chooses to rely on their specific behavior or format in production code, they do so at their own risk.

I'm not sure these items should even be subject to the normal Swift Evolution process. Just fix them!

2 Likes

I tried! But I was asked to go through SE, and I do think it’s valuable to talk through it. I agree that debug things should not be relied upon in theory. But as noted above, the docs are pretty terse, and even with the best docs in the world, no one reads every single docs page, so it’s good to think through the implications of a change like this.

Thanks for citing this

I think this is an important point. Nothing should parse, convert or manipulate the debug description. Our logging systems aren’t doing that. They aren’t even calling the debug description API directly. However, as I said above, swift-log is relying heavily on description which is often calling debugDescription. Furthermore, errors such as the decoding and encoding errors here are often logged. If this proposal is getting accepted as is, then this will most likely break a few logging backends. To make matters worse the only workaround that I see for those backends is to parse the description and sanitize it which is too costly to do in every single log.

In case folks missed this note in the proposal:

Note 1: this proposal is not intended to specify an exact output format. The above is provided as an example, and is not a guarantee of current or future behavior. You are still free to inspect the contents of thrown errors directly if you need to detect specific problems.

I’m glad folks have raised the newline issue, and it’s probably worth digging into and figuring out a pragmatic solution. I’m not very familiar with the inner workings of logging frameworks, but I do wonder whether they ought to be sanitizing their ā€œinputsā€ (the messages being logged) if things like new lines are likely to wreak havoc?

I agree: we should improve the printing of the keys. But it is out of scope for this proposal. That description is constructed by the Foundation encoders/decoders, and SE-0489 is intentionally scoped to just change the stdlib. See Future Directions for ways we might address this to make the output even better, but it’s going to require a Foundation change (which I believe has its own SE-like proposal process).

Most logging backends are not sanitizing messages due to the performance implications. In high performance server use-cases there might be hundreds of logs generated per second across all the cores. If the logging backends would parse every single message and potentially replace newlines they would bring the entire system to a halt. That's why most logging backends that are capable of handling high volume of log messages are just append the message's utf8 bytes to some buffer that gets flushed on a regular interval.

There were some discussions if swift-log should introduce it's own protocol for types to provide a logging description. However, it was decided to leverage Swift's CustomStringConvertible for two reasons:

  1. The protocol already exists, provides a string representation, many types conform to it, and it is a standard practice to not use newlines in description implementations
  2. If swift-log would provide a custom protocol it would force essentially every package to add a dependency if to swift-log to provide this conformance.

Overall, I am very sympathetic on solving the concrete problem at hand to improve the printed descriptions but I would encourage us to pick a description and debugDescription that doesn't include new lines.

2 Likes

SE-0489 has been accepted; please see the announcement for more information.

John McCall
Language Steering Group

3 Likes