Oh I havent heard of HOCON before, would be nice with Swift support for that!
I was at Typesafe when we were standardizing around “Typesafe config” which is what brought us HOCON, maintained and worked with it for quite a bit — it is very nice and a Swift parser for it would be very nice
It is not really a general purpose serialization thing — it is focused on configuration and overlaying multiple configuration files and values onto one another.
It would be very nice if someone wants to make a parser for it in Swift
But nowadays for configuration languages I myself an more existed about languages with more powerful validation mechanisms… like dhall or similar ones. That would be very exciting as well.
Somewhat outside of the topic is just serialization though
Hey, I think this is a great initiative and wanted to give my perspective.
There are a few things that I feel are either impossible or very hacky with Codable. The examples here a specific to JSON, but this applies to other formats as well.
Example 1: Define a unknown fields
property
When implementing a client library, we usually define structs
that represent the responses we get from the server, eg.
{
"id": "1",
"name": "Tobias"
}
struct Person: Codable {
let id: String
let name: String
}
When the server starts sending a new version of the object schema, it might add a new field.
{
"id": "1",
"name": "Tobias",
"age": 35
}
All good, we can still deserialize this object into Person
because JSONDecoder
will just ignore the unknown field.
Now we might have a method,
func api.put(person: Person) { ... }
That takes a Person
struct converts it into JSON and sends it the server, which saves it as-is. Now, if we take the object we got from the server, and then just put
it back, we would remove the age
field. Of course, this assumes that there is no other validation, and null
is a valid value for the age
field. Details don't matter so much, you get the point. It would be great if we could preserve the unknown field. Something like:
struct Person: Codable {
let id: String
let name: String
let unknownFields: [String: AnyCodable] // would be flattened into the same container as `Person`
}
At the moment there is no good representation of AnyCodable
. There are some implementations, but I feel like these is more of an hack (with lots of casting) and make a lot of assumptions about how the different Encoder/Decoder
s are implemented.
This brings be to the second example:
Example 2: Ability to interact with the serialisation directly
In web apis there is the notion of PATCH
requests, that only send the changes you want to make on an object, see RFC 7386 JSON Merge Patch.
With the current state of JSONEncoder/JSONDecoder
from Foundation
there is no good way to implement this.
let person = api.get() // Person(id: "1", name: "Tobias", age: 35)
var modifiedPerson = person
modifiedPerson.age = nil
api.patch(JSONMergPatch.makePatch(from: person, to: modifiedPerson))
/*
PATCH /person/1 HTTP/1.1
Host: example.org
Content-Type: application/merge-patch+json
{
"age": null
}
*/
Today, in order to implement this makePatch(from:to:)
function, you would need to first encode
both person structs into JSON Data
. Then use JSONSerialization
to parse this back into a Any
JSON object. Then cast it to [String: Any]
and then recursively compare the dictionary entries, casting the Any
values into the NSObject
types used by JSONSerialization
in order to compare them. Then finally, encode the JSON object [String: Any]
of the patch document back into Data
and send it to the server. Here, it would be great if the standard library would ship with a pure swift implementation akin to XJSONEncoder/XJSONDecoder
from swift-extras-json. This library allows to convert any Codable
to and from a type-safe JSONValue
, which then can be used for example to compute (or apply) a JSON Merge Patch document in a nice, performant and type-safe manner.
Example 3: Full precision numbers in JSON
This example is actually obsolete, after posting I realised I was using the wrong type: This actually works correctly with Foundation JSONEncoder
/ JSONDecoder
, when using Decimal
.
Wrong example with `NSDecimalNumber`
This is a bit special for JSON, and maybe I am missing something: In JSON numbers are encoded as a string: eg
{
"amount": 99.99
"currency": "EUR"
}
Here 99.99
is not a Double
or Float
, in the binary representation it is just a String
. But when we want to convert this into a Swift struct
, I don't think there is a way to do it like this:
import Foundation
var json = Data("""
{
"amount": 99.99,
"currency": "EUR"
}
""".utf8)
struct Money: Decodable {
enum CodingKeys: String, CodingKey {
case amount
case currency
}
let amount: NSDecimalNumber
let currency: String
init(from decoder: Decoder) throws {
let conatiner = try decoder.container(keyedBy: CodingKeys.self)
self.amount = NSDecimalNumber(string: try conatiner.decode(String.self, forKey: .amount))
self.currency = try conatiner.decode(String.self, forKey: .currency)
}
}
let decoder = JSONDecoder()
let money = try decoder.decode(Money.self, from: json)
/*
▿ DecodingError
▿ typeMismatch : 2 elements
- .0 : Swift.String
▿ .1 : Context
▿ codingPath : 1 element
- 0 : CodingKeys(stringValue: "amount", intValue: nil)
- debugDescription : "Expected to decode String but found a number instead."
- underlyingError : nil
*/
Again, JSONValue
from swift-extras-json models numbers correctly as .number(String)
. But even here, there is no way to get to the "raw" representation of the value through the Decoder
interface.
For reference, in Java world with Lombok and Jackson, this works just fine and will retain the full precision of the BigDecimal
when encoding/decoding JSON:
@Value
class Person {
final BigDecimal amount;
final String currency;
}
This is how it works
import Foundation
var json = Data("""
{
"amount": 99.99,
"currency": "EUR"
}
""".utf8)
struct Money: Codable {
let amount: Decimal
let currency: String
}
let decoder = JSONDecoder()
let money = try decoder.decode(Money.self, from: json)
print(money)
// Money(amount: 99.99, currency: "EUR")
let encoder = JSONEncoder()
encoder.outputFormatting = [ .prettyPrinted ]
let encodedMoney = try encoder.encode(money)
print(String(decoding: encodedMoney, as: UTF8.self))
/*
{
"amount" : 99.99,
"currency" : "EUR"
}
*/
Summary
I feel wha all these examples have in common is, that the rather "rigid" interface of Codable makes it very difficult for framework developers to build solutions that can handle special cases nicely without having to reinvent the entire serialisation / marshalling infrastructure. This has been mentioned multiple times in the thread: it would be nice, if we could move the magic out into library code.
I already abuse Codable for unrelated MP purposes as much as it lets me, so yes, to me this would be ideal!
I'd like to see a way to serialize a Json partially. Codeable is nice but slow
A thing I don't seem to see in the simpler serialization libraries I have used is error checking or correction. Many programs act as if the serialized file is always perfect, and crash horribly if something is wrong, and the libraries don't particularly help with this, simply throwing an file format exception and giving up. Worse is the times when the data looks great, but secretly isn't, as when bit rot happens or when some foolish person edits a file by hand and mistypes, or tries out-of-range things.
Some of this gets into a related idea that I've had (I assume someone must have thought about this long before me?), which are limited range types, such as Integer(10...92)
or Color([.gray, .red, .green])
, giving a compiler more clue as to how correct a value is.
The existence of these would be useful for such a serialization algorithm.
So, to summarize, I want serialization which can tell if the file is borked, fix it if possible, explain the problem to the user, keep going when possible, and know, with fine understanding, what acceptable values are.
- easy way to encode/decode to/from dictionary (or array) representation (in addition to Data). Also consider adding encoding/decoding directly to/from string.
- opt-in's for not including default values during json encoding and a symmetrical opt-in for allowing absent values for variables that have default values during json decoding. example implementation
- optional emitting explicit
nils
in json encoder. link - a way to decode JSON from data where Data contains something else after JSON (can be another json, or a separator, or something totally different.
{"foo":"bar"} {"baz" : "qux"} ...
this API will return me how many bytes are used, so I can continue parsing Data from offset, e.g. calling decoder another time. -
allowsJSON5
forJSONEncoder
(it's already supported forJSONDecoder
). - more easy customization, e.g.:
// straw man syntax:
struct Foo: Codable {
var id: Int // normal case
@json(excluded) var bar: Int = 42 // not in json
@json(excluded, default: 42) var bar: Int // or this if easier
@json(renamed: "hello") var baz: Int // renamed
}
I think with additional runtime support it should be possible to decode weak references without second pass. And avoid two-pass initialisation in other cases, in particular when subscribing to events from child objects.
Weak reference is a pointer to the side table (not true for ObjC, but let's ignore it for now). Normally side table is created after the object itself, but it we allow side tables to be created first, then we can initialise weak reference to a pointer to the side table, which is not linked to an object. And when complete object is constructed, un-archiver can link it with the side table.
If we serialised the following object graph starting from a
:
class A {
weak var b: B?
}
class B {
weak var a: A?
}
let a = A()
let b = B()
a.b = b
b.a = a
De-serialization will look like this:
- We encounter id for object
A
. - We mark id as being constructed
- We enter
A
's decoding initialiser. - Inside
A
's decoding initialiser, we encounter weak reference to 'b'. - We allocate side table for
b
. - Since object id for
b
is not marked as being constructed, we enterB
initialiser. - Inside
B
's initialiser, we encounter weak reference toa
. - Object id encoding instance of
a
is marked as being constructed, but we don't have an side table for it yet. We create side table instance and associate it with object id. Side table is not linked to an instance yet. - Return unlinked side table pointer to 'a'.
- Complete initialisation of
b
and store strong reference inside decoder. - Return pointer to
b
's side table toA
's initialiser. - Complete initialisation of
a
and link side table with an object instance. - Decoder is destroyed, releasing the only strong reference to
b
, because that's our example is small, but stupid.
This introduces new state of weak references - reference to an incompletely initialised object. Reading strong reference from weak reference in such state would return nil
. But after object finishes initialisation, strong references becomes accessible.
This also allows to create weak references from partially initialised objects in normal initialisers:
class Parent: Base {
let child: Child
init() {
child = Child(foo: 42) { [weak self] in // weak self is ok, strong self is still an error
// Will be nil, until Parent is fully initialised
self?.childDidChange()
}
super.init()
}
}
Well, I could start by commenting that your system is effectively 2-pass. It takes two steps to initialize each object, and each object is in a different internal state in each step.
However, this doesn't actually solve the problem:
-
Forcing client code to rearchitect itself to use weak references instead of strong references is still problematic, because there's nothing obvious in the language that guides developers towards that.
-
Design considerations aside, it still doesn't work in any sort of general way. A class
A
that has a reference to aB?
is broken if a consistency requirement between that reference andA
's other properties is violated.
For example, A
's initializer may need to set up properties that depend on properties of a B
instance that aren't available when the B
instance is in this incomplete state.
You can argue against this semi-convincingly with very simple examples involving just 2 classes, but in a real-world object graph there are likely to be lurking dependencies that aren't easily predictable or solvable.
At the very least, there are real-world object graphs where reference cycles are not an error, and are extremely hard to design around solely for the purpose of archiving.
That's why I think:
-
Designing a solution based on non-optionals is a necessary discipline.
-
There needs to be some kind of mechanism where partially-constructed objects (like the ones you just described) are integrated into new language rules about what it's safe to do when in initializers.
After all, initializers already kind of contain 2 behaviors internally — what happens before all the properties are set, and what happens after they're all set — but that's a semantic thing with no real syntax support.
For unarchiving to work, I still think we need 3 kinds of behaviors, with something like either a pre-init or a post-init that allow references to be fixed up. @itaiferber and I discussed this endlessly a few years ago, and never converged on an answer. (To be clear: AFAICR @itaiferber is pretty strongly in the "you can do it with optionals" camp, and has a very deep understanding of the subject, but still I remain unconvinced by that line of argument.)
I think this, specifically, is the crux of any system we'd design to solve weak referencing/circular dependencies. Fundamentally, when you have a dependency chain like this, it cannot be resolved in a single pass because of reentrancy: if B
decodes A
, and A
requires decoding B
and accessing its properties, there's nothing A
can do but return, wait for B
to finish initializing, and then do what it needs to.
NSCoding
and Obj-C implements a single-pass mechanism which is effectively the unsafe version of what @Nickolas_Pohilets is proposing: when B
decodes A
, and A
decodes B
, NSKeyedUnarchiver
(or similar) can happily give A
a reference to B
because Obj-C separates allocation from initialization; so A
gets a reference to the allocated-but-uninitialized B
to hold on to. This solves the weak referencing problem, but is horribly unsafe because if A
tries to do anything with B
, well, it's working with a partially-initialized object. And worse, A
has no way of knowing what state B
is in. In effect, in an -initWithCoder:
, in the general case, it's not safe to do anything but assign values to properties. (Even -awakeAfterUsingCoder:
doesn't quite help, because that's called on the object returned from -initWithCoder:
immediately upon return, not after all of decoding is done, so it's not quite a two-phase mechanism.)
@Nickolas_Pohilets's suggestion brings this idea closer to what Swift might allow, but it still requires a major departure from the inseparable allocation-is-initialization model that Swift has right now, by adding an allocated-but-explicitly-uninitialized object type. I'd presume that Swift wouldn't allow you to do anything with such an object except hold on to it, but it's not clear how the compiler would know at what point the initialization happens in order to allow you to do something with the object. (I can imagine a scheme where the object can be assumed to be fully initialized after init
has completed, but that's not entirely true.)
To sum up my current thinking:
- When you attempt to decode an object that is your parent in a circular dependency chain, you can either get an object back, or not
- If you get an object back, then it must be not-fully-initialized, in which case there's nothing you can do with it
- If you don't get an object back (because handing you a reference would be unsafe), then there's nothing you can do with it
- In either case, there's nothing you can do with the object safely until a second pass takes place after all objects are initialized
- During a second pass, you would either need to:
- Set up references / perform additional validation with the object references you had (if you got an unsafe reference), or
- Assign a value to the
nil
property you had if you got backnil
, then perform (1)
- Model (1) requires significant changes to the language, but model (2) is possible to implement today (though I fully agree — it's more verbose)
- In either case, you need a second pass
I think there's absolutely room for improvement here, but I think there's a balance to strike between safety and verbosity, and I'm not sure where that is. And to be clear: I'd be delighted to see an improved model, or have my mind changed about the possibilities here!
Either way, we're lacking the main component of the whole concept: a structured way to get a second pass to fix up references (because right now, you're on your own).
I don't think any system supporting loops in the graph can work in effectively 1 pass. My point was more about an opportunity to offload second pass to the runtime/decoder, and remove it from user code.
I agree that strong reference cycles are valid use case, but IMO still pretty exotic. For me this falls into "hard things should be possible", not "simple things should be easy" category.
It is not possible to have a reference cycle without any optionals involved in some shape or form. Unrelated to serialisation, you cannot instantiate such object graph programmatically in the first place. You probably can craft an archive that represents such graph, but attempt to de-serialize it should be a runtime error.
class A {
var b: B
init(b: B) { self.b = b }
}
class B {
var a: A
init(a: A) { self.a = a }
}
let a = A(b: B(a: ???))
My suggestion is allow weak references to partially constructed objects (and even objects which are not yet allocated). Attempting to read such reference would produce nil until object is fully constructed. Definition of "fully constructed" is debatable. It can be a point where all properties are initialised. Or when outermost initialiser finishes.
That's true only in regard to creating such a reference cycle purely via initializers. (That is, after all, the exact problem we're trying to solve for unarchiving.)
It's trivial to create a reference cycle without optionals, if there's no limitation to initializers:
protocol P { }
class A {
var p: P
init(p: P) { self.p = p }
}
class B: P {
var a: A
init(a: A) { self.a = a }
}
class C: P { }
let a = A(p: C())
let b = B(a: a)
a.p = b
Although it's unlikely to occur in this simple form, this is not in any way an unusual construct. When an object graph is built up over (run-)time, it's easy to get structures like this.
It seems to me that there's nothing bad in this graph structure as such, aside from the impossibility of unarchiving it (currently).
Is this final assignment to a.p
not isomorphic to a second pass?
It's not isomorphic to a second initialization pass. There's no reason why that assignment has to be made immediately after a
or b
is created. It might happen for unrelated reasons much later, and it may happen conditionally.
Also, in general, a client of a class something like A
might not be aware that a class something like B
has a back-reference to A. The mutual references are obvious here, but they're not necessarily obvious in real code.
I wanted to add a description of a use case that must be incredibly common, but I found maddeningly difficult to address with 🐟able
.
I have a data structure with many repetitions of a single value scattered throughout, and the value can be quite large (the contents of a source file). That's not a problem for my data structure, since variable-sized data in Swift is always CoW'd: copies are cheap. Obviously when I encode my data structure I don't want to end up with multiple copies in the archive, so they need to be deduplicated and serialized once. This deduplication pattern is commonly needed by people who serialize networks of class instances, which makes it not a little shocking to me that it's not already supported.
The changes I ended up with are here. They are quite awful… but they were the best I could do. I'm happy to answer any questions.
Particular obstacles:
- deduplication requires communicating information gathered during the encoding of subparts of the data structure up to the process of encoding the part of the data structure that knows about the deduplication, but there only vehicle for that communication is the
userInfo
dictionary - Attaching mutable state to a
userInfo
dictionary is incredibly awkward, and can only be done to the top level encoder, which means you have to know about and accommodate this pattern at the top level of your whole data structure, and therefore your data structure can't be composed into other🐟able
data structures. - The encoders/decoders actually used by the encoding/decoding methods of a type are not the same as the top level encoders/decoders (e.g.
JSONEncoder
), which do not even conform toEncoder
/Decoder
This problem is very similar to the one @Loooop is solving, but it applies even if there are no classes in play.
I like your implementation however I think deduplication is a property of the coder rather than the codable. The definition of equality will vary between coders: for certain coders Hashable
should be more than sufficient, whereas for others Identifiable where ID : Codable
will be needed.
Regards cycles, it may be possible but would require mapping coding keys to key paths (or the other way around I'm not sure).
FWIW I think any replacement / improvement should be based off keypaths with the generated conformances being based of a deeper initialisation-by-keypath mechanism. 99%+ of codable conformance is compiler generated from properties so it makes sense to start there. It would also tighten up the compile time type checking as keypaths come with type information.
A bit more detail
Just expanding a bit on initialisation-by-keypath, the lines marked don't seem safe as we're using the property name twice meaning the system can't make certain guarantees. Implementing low level functionality to bypass these rules while allowing all objects to be initialised in this, as yet unspecified, new way.
struct Bob: Codable {
var aString: String
var anOptionalString: String?
init(from decoder: Decoder) throws {
let container = try decoder.container(keyedBy: CodingKeys.self)
aString = try container.decode(forKeyPath: \.aString) // ‼️
anOptionalString = try container.decodeIfPresent(forKeyPath: \.anOptionalString) // ‼️
}
func encode(to encoder: Encoder) throws {
var container = encoder.container(keyedBy: CodingKeys.self)
try container.encode(self, forKeyPath: \.aString)
try container.encodeIfPresent(self, forKeyPath: \.anOptionalString)
}
}
[quote="jjrscott, post:138, topic:46641"]
I like your implementation however I think deduplication is a property of the coder rather than the codable.
[quote]
Not in my use-cases. I never want to deduplicate an Int; do you?
The definition of equality will vary between coders:
In my world equality is an intrinsic property of a type; it doesn’t, and shouldn’t, change based on context.
Apologies for the confusion, when I said "the definition of equality will vary between coders" I really meant that the basis on which deduplication is performed will vary between coders. This is especially true if we have to take into account cycles.
Having argued it's a property of the coder I'm just now thinking about reimplementing your solution using CodableWithConfiguration
via the excellent The Mysterious CodableWithConfiguration Protocol • Andy Ibanez. Seems like one might be able to pass the neccessary information around without relying on userInfo or some other coder specific nefarious means.
I had abandoned the development of GraphCodable for a long time (programming is just a hobby for my free time) but I took it back in the past few days and cleaned up the library by solving some problems that I had left hanging (eg: the reference type replacing system and others). It is now roughly equivalent to NSCoding / NS(Keyed)(Un)Archiver.
Answering: well, if I don't have access to the inner reference type that implements COW in value types, there's nothing I can do with GraphCodable.
But if I have access to it, everything is simple.
In the example, MyArray is analogous to Array, but I have access to the inner reference Box:
import Foundation
import GraphCodable
//**************************************************************************
// Skeleton of an array where the internal reference type
// needed for the COW conforms to GCodable
final class Box<T> {
var value : T
init( _ value:T ) {
self.value = value
}
}
struct MyArray<Element> {
private var box : Box<ContiguousArray<Element>> // just for simplicity
init() {
box = Box( ContiguousArray<Element>() )
}
private mutating func updateCOW() {
if !isKnownUniquelyReferenced( &box ) {
box = Box( box.value )
}
}
}
extension MyArray : RandomAccessCollection, RangeReplaceableCollection {
var startIndex: Int { box.value.startIndex }
var endIndex: Int { box.value.endIndex }
func index(before i: Int) -> Int { box.value.index(before: i) }
func index(after i: Int) -> Int { box.value.index(after: i) }
subscript(i: Int) -> Element {
get {
box.value[i]
}
set {
updateCOW()
box.value[i] = newValue
}
}
mutating func replaceSubrange<C>(_ subrange: Range<Int>, with newElements: C) where C : Collection, C.Element == Element {
updateCOW()
box.value.replaceSubrange(subrange, with: newElements)
}
}
extension MyArray : CustomStringConvertible {
var description: String {
box.value.description
}
}
extension MyArray : Equatable where Element:Equatable {
static func == (lhs: MyArray, rhs: MyArray) -> Bool {
return lhs.box === rhs.box || lhs.box.value == rhs.box.value
}
}
// Adding GCodable support (unkeyed encoding/decoding):
extension Box : GCodable where T:GCodable {
convenience init(from decoder: GDecoder) throws {
self.init( try decoder.decode() )
}
func encode(to encoder: GEncoder) throws {
try encoder.encode( value )
}
}
extension MyArray: GCodable where Element:GCodable {
init(from decoder: GDecoder) throws {
box = try decoder.decode()
}
func encode(to encoder: GEncoder) throws {
try encoder.encode( box )
}
}
//**************************************************************************
do {
let a = Array( [1.5, 2.5] )
let b = Array( [a,a] )
let inRoot = Array( [b,b] )
print("••• Array = \(inRoot) ")
print("GraphCodable Encoded data structure:")
print( try GraphEncoder( fullBinaryEncode: false ).dump( inRoot ) )
}
do {
let a = MyArray( [1.5, 2.5] )
let b = MyArray( [a,a] )
let inRoot = MyArray( [b,b] )
print("••• MYArray = \(inRoot) ")
print("GraphCodable Encoded data structure:")
print( try GraphEncoder( fullBinaryEncode: false ).dump( inRoot ) )
}
GraphEncoder( fullBinaryEncode: false ).dump( inRoot )
generates a human-readable string of the archive.
The Swift Array output shows that the data [1.5, 2.5] was duplicated during encoding (this also happens with Swift.Codable):
== GRAPH =========================================================
- VAL
- VAL
- VAL
- 1.5
- 2.5
.
- VAL
- 1.5
- 2.5
.
.
- VAL
- VAL
- 1.5
- 2.5
.
- VAL
- 1.5
- 2.5
.
.
.
==================================================================
In the case of MyArray this doesn't happen (only one [1.5, 2.5]) because GraphCodable "sees" the inner Box reference and doesn't duplicate it:
== GRAPH =========================================================
- VAL
- REF1000 MyGraphCodableApp.Box<Swift.ContiguousArray<MyGraphCodableApp.MyArray<MyGraphCodableApp.MyArray<Swift.Double>>>>
- VAL
- VAL
- REF1001 MyGraphCodableApp.Box<Swift.ContiguousArray<MyGraphCodableApp.MyArray<Swift.Double>>>
- VAL
- VAL
- REF1002 MyGraphCodableApp.Box<Swift.ContiguousArray<Swift.Double>>
- VAL
- 1.5
- 2.5
.
.
.
- VAL
- PTR1002
.
.
.
.
- VAL
- PTR1001
.
.
.
.
==================================================================
Edit: A slightly better version of the MyArray code.
Perhaps. That is certainly a reasonable generalization of the problem.
I tried going down that road and failed; the problem was that there doesn't seem to be any way to set up configuration at a high level of the call stack (where the deduplication dictionary is initialized and then subsequently serialized) that is used at a lower level (where the duplicates are actually discovered). Please report back if I've overlooked something.