Move Combine's TopLevelEncoder and TopLevelDecoder protocols into the standard library

Combine introduced two Codable-related protocols to represent top level encoders and decoders. The TopLevelEncoder is used to abstract JSONEncoder and PropertyListEncoder for use in the encode(encoder:) operator. Decodable counterparts are also provided.

I think these protocols would be useful in the standard library for two main reasons:

  • Vendors of custom encoding/decoding packages wanting to participate in this environment would need to import Combine to do so, which may be undesirable.

  • Packages that want to provide functionality that is agnostic to the specific encoder/decoders being used, would also have to import Combine.

The requirement of Combine limits its use, especially as data manipulation packages would generally be cross-platform, where Combine is not available.

25 Likes

It might be possible for packages to do something like this:

#if canImport(Combine)
import Combine
#else
protocol TopLevelEncoder {...}
protocol TopLevelDecoder {...}
#endif

extension MyEncoder: TopLevelEncoder {}
extension MyDecoder: TopLevelDecoder {}

to at least expose some form of top level encoding/decoding to non-combine-compatible code, but that won't be very scalable if multiple packages want to use or enable this sort of functionality, as this would have to be repeated in every package that makes their own encoder/decoder.

Are there any ABI/module stability considerations with this? I presume that now that combine is public it will still need to provide this protocol to ensure compatibility, but if this can be done with minimal invasiveness we should definitely try.

I've always been somewhat surprised that there doesn't appear to be a protocol that JSONEncoder (or a property list encoder) conforms to in the standard lib. So this definitely gets a +1 from me.

8 Likes

Yeah, it's trivial for packages to supply their own version of the protocols. More than just the annoyance of repetition though, the protocols may look the same, but they actually aren't, so you would still need to provide conformance as a user of two independent packages.

I think for ABI stability reasons, Combine will still need to provide these types. Also, because users could have already explicitly specified Combine.TopLevelEncoder in existing code. However, I believe Combine could re-define TopLevelEncoder to be a typealias for the swift standard library's protocol, which would allow them to interoperate.

I'm totally not an expert in this, so I'd love to understand how much of a bother these new standard library protocols would cause for the existing Combine types.

1 Like

Thanks to work by @Xi_Ge, we now have (unofficial, compiler-internal, no-user-serviceable-parts-inside, offer-void-in-Nebraska) support for moving declarations from one library "up" into another library that the original imports. Since every Swift library imports the standard library, I think this would make it possible to move these protocols into the standard library without breaking ABI.

A typealias would preserve source stability, but not ABI stability.

11 Likes

Is the source code for the Combine framework publicly available?

No.

Nope, I meant publicly available to use, aka not in a beta OS/Xcode

I think I replied to this in a thread a while ago, but for what it's worth: we (Combine authors) did intend for TopLevelEncoder and TopLevelDecoder to be standard library types if possible. I support moving them down, especially now that we have the compiler feature to let us do it.

14 Likes

This would be great. Alamofire 5 will ship with a DataDecoder protocol to support our responseDecodable feature with arbitrary decoders, so it would be nice to have an official replacement.

Sounds very positive! Will this change require a Swift Evolution proposal? I'm happy to draft one up if so.

1 Like

Yes, I believe so since it would be a change to the Swift standard library.

1 Like

In the light of this discussion, I would like to bring up an issue I have adopting TopLevelDecoder and TopLevelEncoder in CodableCSV (a CSV encoder/decoder library). It was also mentioned by @ole in a blog post.

Problem: Conforming to TopLevelDecoder forces the adopting decoder to be specialized for a single type of input.

Foundation's JSONDecoder and PropertyListDecoder only expose a single decode<T>(T.Type, from: Self.Input) function where Input is Data. However CSVDecoder accepts a CSV from a data blob, a string, or a url (pointing to a file). The string input function is arguably just a convenience; however the decode(T.Type, from: URL) has a different behavior and it deserves existing. Its main benefit is that it won't load the whole CSV in memory and decode as needed (there are huge CSV files out there).

Interestingly CSVDecoder also offers a rich set of configuration values including the typical decoding strategies (dateDecodingStrategy, nonConformingFloat, etc.) and some CSV/TSV specific configuration (such as field/row delimiters, header and trim strategies, etc.).

It is fairly common to create a CSVDecoder, set the configuration values and then reuse throughout your application.

let decoder = CSVDecoder()
decoder.encoding = .utf8
decoder.delimiters.row = "\t"
decoder.trimStrategy = .whitespaces
decoder.nilStrategy = .empty

// 1. Use the global decoder to decode data from the internet.
let resultA = try decoder.decode([Student].self, from: data)

// 2. Use the global decoder to decode a file in the file system.
let resultB = try decoder.decode([Student].self, from: fileURL)

If I were to adopt TopLevelDecoder on CSVDecoder I have two choices:

  1. Restrict the Self.Input to Data (as JSONDecoder and PropertyListDecoder do).
    extension CSVDecoder: TopLevelDecoder {
        public typealias Input = Data
    }
    
  2. Make CSVDecoder generic over Input and create three extensions adopting TopLevelDecoder.
    class CSVDecoder<Input> { ... }
    
    extension CSVDecoder: TopLevelDecoder where Input==Data {
        func decode<T:Decodable>(_ type: T.Type, from: Input) throws -> T { ... }
    }
    
    // Same thing from Input==URL and Input==String
    

The problem with #1 is that it can only be used with Data blobs when a TopLevelDecoder is requested (such as with Combine operators).
The problem with #2 is twofold: the decoder cannot be reused for different inputs and the initializer accepts any Input when only three are supported.

Question: Am I missing another way to adopt TopLevelDecoder?

Funnily @Philippe_Hausler and @itaiferber seems to be discussing top level encoders/decoders that in another thread too.

3 Likes

You could define Input to be an enum type with cases (and their associated values) for each of the actual input types you listed. One downside of that is that call sites need to wrap the input value in the wrapper type.

1 Like

True. This might be the best option from them all yet (less drawbacks than the other two). Thank you for bring it forward @pyrtsa . However, it still seems subpar that the user has to wrap the input on an enum before passing it on. Moreover when the user doesn't have to do that when using JSONDecoder or PropertyListDecoder.

Personally, I’d design it so that it conforms to TopLevelDecoder with the Data input, to mirror the Foundation decoders. Leave the file URL input as a separate method, with an explicit argument label to denote that the input will in fact be processed by streaming.

3 Likes

I hope if TopLevelEncoder and TopLevelDecoder are ever added to the standard library, it doesn't break on non-Apple platforms when using open-source libraries like OpenCombine that define their own version of TopLevelEncoder and TopLevelDecoder.

I think it'll be the same as it was with adding the Result type to the standard library. IIRC, it was solved by always preferring types from modules other than the standard library in case of ambiguity.

3 Likes

Before these get @frozen, I’d like to make a request: add the following requirement to both TopLevelEncoder and TopLevelDecoder:

var userInfo: [CodingUserInfoKey : Any] { get set }

Decoder already requires it to be accessible, and all of the implementations I’ve seen already satisfy it. It’s more or less essential, and it would make writing generic functions much easier.

2 Likes

I just had to do this instead, and I never want to do it again:

Mirror(reflecting: decoder) // TopLevelDecoder
    .children
    .first { $0.label == "userInfo" }?
    .value as? [CodingUserInfoKey: Any]
3 Likes