Move Combine's TopLevelEncoder and TopLevelDecoder protocols into the standard library

jasdev · May 7, 2020, 8:28pm

If the next step is writing a proposal to move the protocols into the SL, proper—I’d be glad to help draft it (I’ll probably have questions hah). It’d be my first shot at an SE pitch.

jasdev · May 7, 2020, 8:43pm

er, sorry my bad just saw Daniel’s previous mention on drafting. I can take a crack at the implementation.

danielctull · May 7, 2020, 10:16pm

Just to say that @jasdev and I are woking on a proposal for this.

danielctull · May 7, 2020, 10:17pm

Could you give an example of why you want this so we can work it into the proposal?

Saklad5 · May 7, 2020, 10:29pm

Like many developers, I want to decode NSManagedObject subclasses asynchronously. Since I need to do it often, I decided to write a Combine operator that functions identically to the existing decode(type:decoder:) operator, except that it operates on the private queue of an NSManagedContext if one exists in the given decoder with a key of context.

While I could just require the context to be provided at the call site, I think it would be more reasonable to require userInfo. After all, Decoder already requires it, and that’s useless if there is never an opportunity to add anything.

danielctull · May 8, 2020, 6:05pm

@jasdev and I have drafted up a proposal for adding these protocols.

We have explicitly left out the userInfo part discussed because there were cases of TopLevelEncoder/Decoders we found on GitHub that didn't have a userInfo property and we want to allow as many libraries to conform their types as possible. We'll take further advice on this however.

@jasdev has added the protocols to his fork of swift where you can compare the changes.

TopLevelEncoder and TopLevelDecoder Protocols

Proposal: SE-NNNN
Authors: Daniel Tull, Jasdev Singh
Review Manager:
Status:
Implementation: WIP branch

Introduction

This proposal introduces TopLevelEncoder and TopLevelDecoder protocols to the Standard Library (currently located in Combine), which are useful for representation-agnostic encoding and decoding of data.

Swift Evolution pitch thread: Move Combine’s TopLevelEncoder and TopLevelDecoder protocols into the standard library

Motivation

The following can apply to both decoding and encoding, but to prevent repetition we will focus on decoding.

A function may want to be able to decode data, but not know the implementation details of the specific encoding used. Consider an open-source networking package with a type to wrap the details of a network request.

struct Resource<Value> {
    let request: URLRequest
    let transform: (Data, URLResponse) throws -> Value
}

It may want to include an initializer for decoding a decodable type using a decoder, but be agnostic to the format of the data, so it also specifies a protocol. The package also defines conformance for the decoders in Foundation, JSONDecoder and PropertyListDecoder.

protocol Decoder {
    func decode<T>(_ type: T.Type, from: Data) throws -> T where T: Decodable
}

extension JSONDecoder: Decoder {}
extension PropertyListDecoder: Decoder {}

extension Resource where Value: Decodable {
    init<D: Decoder>(request: URLRequest, value: Value.Type, decoder: D) {
        self.init(request: request) { data, _ in
            try decoder.decode(Value.self, from: data)
        }
    }
}

This is fine if the caller wishes to use it for JSON or property list formatted data. However, they may have data defined as YAML and thus choose to use a package that provides a YAMLDecoder.

class YAMLDecoder {
    init() {}
    func decode<T>(_ type: T.Type, from: Data) throws -> T where T: Decodable {
        // Implementation of YAML decoder.
    }
}

For YAMLDecoder to conform to Decoder, it would have to import the networking package which isn’t great because users of the YAML package may not necessarily wish to do so.

Users of both packages have to conform to a protocol they don’t control and further, possible changes to the library might break existing conformances.

And lastly, the Combine team relayed their intent on these two protocols living in Swift proper:

[The Combine authors] did intend for TopLevelEncoder and TopLevelDecoder to be Standard Library types, if possible. I support moving them down, especially now that we have the compiler feature to let us do it.

Proposed solution

Introduce new TopLevelDecoder and TopLevelEncoder protocols:

/// A type that defines methods for decoding.
public protocol TopLevelDecoder {

    /// The type this decoder accepts.
    associatedtype Input

    /// Decodes an instance of the indicated type.
    func decode<T>(_ type: T.Type, from: Self.Input) throws -> T where T: Decodable
}

/// A type that defines methods for encoding.
public protocol TopLevelEncoder {

    /// The type this encoder produces.
    associatedtype Output

    /// Encodes an instance of the indicated type.
    func encode<T>(_ value: T) throws -> Self.Output where T : Encodable
}

These protocols can be adopted by packages defining new decoders or encoders and those wanting the functionality described above.

Detailed design

The JSONDecoder and PropertyListDecoder types in Foundation should be made to conform to the TopLevelDecoder protocol.

Likewise, the JSONEncoder and PropertyListEncoder types in Foundation should be made to conform to the TopLevelEncoder protocol.

Source compatibility

This is a purely additive change. We can similarly lean on the shadowing work from SE-0235 that allowed Result to be added to the Standard Library.

Effect on ABI stability

This is a purely additive change.

Effect on API resilience

This has no impact on API resilience which is not already captured by other language features.

Alternatives considered

None.

ktoso · May 9, 2020, 4:48am

Hi @danielctull, thanks for kicking off proposal work here

I've been looking into this since a while and while did not yet have the time to hash out all specifics allow me to add some more context what will need to be done here, as there's a few more concerns than it seems at first:

ABI implications of "moving" a type

Since the symbols already exist in an Apple framework (Combine) and we'd want Combine to be able to use those "new" (moved) TopLevelEn/Decoder types as well, we have to take some care around this and the change is not really just additive. First, we'll need utilize a recent compiler feature[1] that @Xi_Ge developed, an mark the "new" types using @_originallyDefinedIn

At te same time, we'll need to coordinate with Combine, to have it re-export the stdlib's version of those types/symbols, since "old apps" compiled against a version of Combine which had those types defined in itself would still be looking there for them. Instead, we'd want those apps to find the types that are now part of the stdlib (and there the originally defined in annotation will ensure that the symbol matches with what the applications are looking for).

Re-considering `userInfo`:

If I remember my prior digging into this correctly I quickly arrived at the conclusion that including userInfo is necessary for many real scenarios, so we should perhaps re-visit this (I'll give it a look next week on my end).

I hope to get time soon to look into the specifics of coordinating this dance between libraries and teams.

[1] TBDGen/IRGen: generate $ld$hide$os symbols for decls marked with @_originallyDefinedIn #28691

Saklad5 · May 9, 2020, 6:35pm

I think the reason that userInfo was left out is clear: Combine doesn’t need it.

So long as Decoder and Encoder require userInfo, TopLevelDecoder and TopLevelEncoder should too. On a related note, did you find any implementations that didn’t initialize empty userInfo dictionaries? I don’t see how it would be possible to conform to the existing protocols without doing so.

danielctull · May 9, 2020, 10:55pm

You’re correct that the implementations that didn’t have a userInfo property, included it in the initialisers, so I imagine it wouldn’t be hard for those implementations to adopt this requirement.

kumowoon1025 · May 14, 2020, 6:14am

I'm confused, what does "top level" refer to in TopLevelEncoder and TopLevelDecoder? The SE-0167 mentions:

It should be noted here that JSONEncoder and JSONDecoder do not themselves conform to Encoder and Decoder ... This is because JSONEncoder and JSONDecoder must present a different top-level API than they would at intermediate levels.

But I don't really understand why (there is an internal __JSONEncoder/Decoder that conform to Encoder and Decoder)

Does it mean it only encodes/decodes complete json, that the root top-level object is supposed to be?

ktoso · May 14, 2020, 7:11am

So the status quo and intended difference is between

"the thing which has func container<Key>(keyedBy:)-and-friends defined on it. In other words used when implementing Codable conformances. Those are Decoder/Encoder)
"the thing that people can only call encode()/decode() on" on the "outer"/"top level" layer in applications etc, where they want to use the coding infrastructure to encode/decode a type. This is not the same as 1. because it does not necessarily make sense to expose container and other functions on that API one can argue. These are the types and use case discussed here, TopLevelEncoder/Decoder.

The naming here isn't the most intuitive, I agree somewhat.

So the goal was to not expose the "used by coder implementation" functions to people who only need to invoke "encode/decode my type". The prior use case has types in the stdlib: Encoder/Decoder, the latter (today) does not which caused Combine (and others) create such type because indeed it's quite needed to abstract over accepting various coders in libraries.

itaiferber · May 14, 2020, 1:24pm

In a way, yes. The "top level" means looking at the data from the very root of the structure. Encoder and Decoder are protocols that make the most sense when you're looking to decode values inside the tree, but that level of specificity doesn't make for great API when you're not the one doing the format-specific decoding. Most API consumers just care about taking their JSON Data and getting values out of the whole data blob, not about extracting data with containers.

For a little more background, see my post in Are custom Encoders/Decoders supported? - #7 by itaiferber (and the linked post inside)

@ktoso's summary is a good way to put it, too.

beccadax · May 14, 2020, 11:30pm

If you imagine the data structure being coded as a tree drawn with its root at the top and its leaves at the bottom, then Encoder and Decoder are used at every level of the tree so that each node can add its children to the tree. TopLevelEncoder and TopLevelDecoder, on the other hand, are used to encode or decode the "top level"—i.e. the root—of that tree.

I'm not sure I love the name, but on the other hand, I'm having trouble coming up with a better one.

salutis · May 17, 2020, 12:27am

Didn’t you just came up with one? If everyone calls it “the root”, perhaps Root would be a better name than TopLevel.

xwu · May 17, 2020, 1:37am

I could get on board with RootEncoder and RootDecoder.

mattpolzin · May 17, 2020, 4:52pm

I’m not sure I find any naming I’ve seen or thought of so far to be more intuitive than the others. In all cases I would jump to the code docs and read the description and that’s how I would go from wondering what it meant to forever having the answer.

That said, I’ll throw my gut feeling into the shed painting party:
DocumentEncoder/DocumentDecoder

I consider Document to be a relatively commonly used word for “the whole thing” in this context.

Saklad5 · May 17, 2020, 8:41pm

I think the main issue with the current names is that they incorrectly imply protocol inheritance where none exists. Given a protocol named Decoder and a protocol named TopLevelDecoder, it seems obvious that TopLevelDecoder refines Decoder.

The ideal solution is probably renaming Decoder and Encoder, but that’s obviously a breaking change. If that’s off the table (and I don’t think it should be forever), then we need to name these new protocols without using Decoder or Encoder as a suffix.

This will continue to be an issue for types like JSONDecoder and PropertyListEncoder unless we change the existing protocol names, of course.

Saklad5 · May 17, 2020, 8:49pm

Putting aside potential name changes, it might be worth adding a couple protocol compositions to be consistent with Codable:

typealias Coder = Decoder & Encoder

typealias TopLevelCoder = TopLevelDecoder & TopLevelEncoder

xwu · May 17, 2020, 8:59pm

Saklad5:

Putting aside potential name changes, it might be worth adding a couple protocol compositions to be consistent with Codable :
typealias Coder = Decoder & Encoder

typealias TopLevelCoder = TopLevelDecoder & TopLevelEncoder

For what reasons might these be worth adding?

danielctull · May 17, 2020, 9:10pm

In looking into custom top level encoder/decoders on GitHub, I don’t recall finding any pair that were implemented by a single type. Given this, I’m having trouble seeing the value in the TopLevelCoder typealias suggested. I’m happy to be convinced otherwise though.