URL fails to decode when it is a generic argument and `GenericArgument(from: decoder)` is used

George · May 6, 2020, 11:11pm

In the following code, GoodURLContainer decodes fine, but BadURLContainer throws an error (I believe from here). I'm guessing this is because the generic machinery of Decodable takes the wrong codepath somewhere and ends up trying to decode the URL via KeyedDecodingContainer. Any thoughts on whether or not this is a bug?

import Foundation

struct GoodWrapper<T: Decodable>: Decodable {
  init(from decoder: Decoder) throws {
    _ = try decoder.singleValueContainer().decode(T.self)
  }
}

struct BadWrapper<T: Decodable>: Decodable {
  init(from decoder: Decoder) throws {
    _ = try T(from: decoder)
  }
}

struct GoodURLContainer: Decodable {
  let url: GoodWrapper<URL>
}

struct BadURLContainer: Decodable {
  let url: BadWrapper<URL>
}

let data = """
  {
    "url": "https://en.wikipedia.org/wiki/Diceware"
  }
  """.data(using: .utf8)!
let decoder = JSONDecoder()
try? decoder.decode(GoodURLContainer.self, from: data)
try? decoder.decode(BadURLContainer.self, from: data)

ahti · May 7, 2020, 10:03am

I'm not quite sure why URL en/decodes that way in the general case, but aiui you should always go through singleValueContainer to give the en-/decoder a chance to apply any special case handling it wants to do.

itaiferber · May 7, 2020, 1:39pm

@josh2 Is exactly right:

GoodWrapper calls into the decoder to decode T — this gives the decoder the opportunity to inspect T and apply specific behavior for it. In this case, JSONDecoder has special handling for URL
BadWrapper calls into T directly, never giving the Decoder the opportunity to "see" T, and always falling back to T's implementation

As @ahti mentions, you should always give the Decoder the opportunity to apply its logic to T — otherwise, you're likely to get inconsistent results throughout an archive when decoding.

As for:

When creating a URL, you can create one out of an absolute URL string, or with a path relative to a base URL:

import Foundation

let url1 = URL(string: "https://example.com/sample/path?query=yes")!
let url2 = URL(string: "sample/path?query=yes", relativeTo: URL(string: "https://example.com"))!

Both of these URLs have the same absolute representation:

print(url1.absoluteString) // https://example.com/sample/path?query=yes
print(url1.absoluteString == url2.absoluteString) // true
print(url1.absoluteURL == url2.absoluteURL) // true

However, these URLs are not equal to one another:

print(url1.baseURL, url1.relativeString, url1.relativePath, separator: ", ") // nil, https://example.com/sample/path?query=yes, /sample/path
print(url2.baseURL, url2.relativeString, url2.relativePath, separator: ", ") // Optional(https://example.com), sample/path?query=yes, sample/path
print(url1 == url2) // false

The two URLs' base and relative portions are not equal, so the URLs are considered not equal. This stands out a little bit more when you look at URLs whose baseURL already has a path component:

let url1 = URL(string: "https://example.com/sample/path?query=yes")!
let url2 = URL(string: "sample/path?query=yes", relativeTo: URL(string: "https://example.com"))!
let url3 = URL(string: "path?query=yes", relativeTo: URL(string: "https://example.com/sample/"))!

print(url1.baseURL, url1.relativeString, url1.relativePath, separator: ", ") // nil, https://example.com/sample/path?query=yes, /sample/path
print(url2.baseURL, url2.relativeString, url2.relativePath, separator: ", ") // Optional(https://example.com), sample/path?query=yes, sample/path
print(url3.baseURL, url3.relativeString, url3.relativePath, separator: ", ") // Optional(https://example.com/sample/), path?query=yes, path

Here, both the relativeStrings (whole path relative to the root base URL) and the relativePaths (path given relative to the full base URL) are different, and these URLs are not equal.

Thus, in the general case, URL always encodes conservatively, encoding its baseURL and relativeString separately to ensure that round-tripping a URL with its implementation guarantees that the decoded value is equal to the original. However, this encoding format isn't terribly useful, especially when interfacing with 3rd-party APIs which expect URLs in the form of absolute strings. This is especially true for most JSON endpoints, so JSONEncoder/JSONDecoder special-case URLs to produce absolute strings; although not all URLs round-trip equally through JSON encoding, they are more generally useful this way. (PropertyListEncoder, for instance, does not have this preference, and delegates to URL to do its encoding.)