Codable != Archivable

itaiferber · January 30, 2019, 5:54pm

Jeroen537:

Further background: I am coming from Python where you have the pickle/unpickle function pair which can store and retrieve any object, simple or complex, to file without any extra coding needed.

This seems such a useful feature to me that I am wondering why Swift does not have it.

Is there anything in the language that resists it? I would think that such a function even might be written in Swift itself, at least for instances conforming to Codable, provided that there would be a way to persist ObjectIdentifier instances (to be able to identify object identity at restore time). But there may be issues that I don't know about that might need more introspection than is available.

<snip>

In summary: Why is there not a native storage format for Swift object graphs, and functions to save and restore to that format, without any protocol requirement? Is it naive to think that it would be possible? Is it against the spirit of Swift? Has it been discussed, and then rejected? Or is such a thing perhaps on the roadmap?

These are all great questions! To briefly answer the questions before diving into the details:

What you're looking to do, for various reasons detailed below, is best described and achieved via NSCoding
This is not currently possible to do in Swift, but I don't think we would want to make implicitly possible to encode objects in this way, for the same reasons that it was not done for Objective-C, where it is possible via the runtime
In this application, Python specifically benefits from being a dynamically-typed language, and pickle benefits from working under largely the same constraints that NSCoding does

To add some detail, in somewhat reverse order:

Python's pickle benefits from several things:
1. Save for primitive types, almost everything in Python is an object. You cannot define structs in the same way that you can in Swift, and all objects are present in the runtime
2. Python objects are easily introspectable at runtime, and Python has no real access restrictions like Swift does: nothing really prevents you from mucking about with objects without their knowledge. Importantly, this allows you to construct objects from a list of properties without necessarily needing to call their constructors
3. Because Python doesn't have the same kind of access restrictions that Swift does, its class naming scheme is relatively simple, and not subject to significantly surprising behavior (detailed here, but I'll comment further on this)
As opposed to this, Swift has the following to consider:
1. Swift is a strongly-typed, compiled language. This means that there are strong restrictions on how you can treat objects, and any archived format would need to take care to consider the types of objects being assigned to properties on deserialization. If an archive contains a dictionary where I expect there to be an integer, Swift cannot simply assign the dictionary to an integer property — the size, layout, and semantics of the types are completely different.
  
  Python, being so dynamic, has no problem assigning a dictionary where an integer would normally go, so unpickling is relatively straightforward, and this problem can be largely glossed over.
2. Swift has access restrictions where Python is much more free-spirited about things. While I believe the runtime has the power to ignore some of these access restrictions in assigning properties at runtime, this is not exposed to users because it goes against the spirit of the language: Swift is simply stricter about these things, and has many rules in place to maintain order. Unlike Python objects, Swift objects are not typically "bags of stuff" which you can instantiate and place things into — to construct a Swift object, you must call an initializer and treat the object much more carefully to maintain type safety
3. As part of these access restrictions, Swift allows whole types to be private or fileprivate, which has a marked effect on the actual class name — as detailed in the other post, moving a private or fileprivate class around can change its actual class name! pickle also glosses over this issue — if you rename a class or a module, there isn't much support without subclassing pickle.Unpickler and manually renaming the type yourself. If this is a private type owned by another library you don't know about, there isn't much you can do
4. Swift, of course, also has structs to deal with, which have their own potential host of issues. Because structs are not objects, they don't participate in the runtime in the same way. I believe that private and fileprivate structs don't necessarily even have to have their names embedded in binaries or in the runtime depending on how they are used, making them inaccessible. It would be a real shame if this implicit encodability were possible for only classes, since many classes end up containing structs in properties, and so those classes could not participate (unfortunately, this is one of the drawbacks that NSCoding has as well, and requires manually encoding struct properties via various other strategies)
Objective-C is much closer to Python in its runtime capabilities, and could support arbitrary encoding of all objects, but doesn't do this in favor of formalizing the concept with NSCoding. There is a whole host of things that various objects prefer to do that are best represented by an opt-in API. For instance, weak properties, as you mention, usually require special attention — most objects don't need to encode or decode their delegates if they have them (since delegates tend to be runtime-state only, and don't even necessarily represent an encodable object)

So, to sum up: doing this implicitly would both go against the spirit of Swift, and isn't necessarily technically feasible without changes to the language. NSCoding is much more tailored to what you're trying to do, since its goals are exactly the use-case you describe. It does have the drawback of requiring NSObject inheritance in practice, but given the discussion in this thread and others, we are also interested in making object graphs easier to represent with Codable as well.

On one more note: pickle (and marshal) also doesn't really deal with security in any meaningful way:

Warning: The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

This is another drawback to doing something like this totally implicitly: you have no way of knowing what the object graph you meant to encode or decode might look like, so any pickled object graph can contain anything at decode time, which is Bad™ if they are maliciously altered. This is the reason for NSSecureCoding (to help describe what you expect your object graph to look like so you can prevent arbitrary code execution at runtime), and one of the reasons Codable is not tailored toward including class information in archives.

QuinceyMorris · January 31, 2019, 1:41am

@itaiferber already answered most of your questions, so I just want to add comments on a couple of points:

In fact, pure Swift classes do not inherit from NSObject (or any Obj-C class). For technical reasons, pure Swift classes can be a bit more efficient and flexible since they don't have to support the Obj-C runtime at all. However, in many cases, there's no real downside to inheriting from NSObject (or conforming to NSCoding, which is much the same thing).
Archiving object graphs is definitely something we hope that the Swift language will add support for, natively, at some point in the future. It's been discussed repeatedly, but a language-level addition is a pretty hard problem, so we aren't there yet.

In terms of what to do in the meantime, if you use NSCoding you should be able to write code that builds a dictionary of unique IDs for your objects, which would allow you to avoid archiving actual references between objects themselves. With a bit of care and planning, this shouldn't be extremely onerous for most scenarios.

In many scenarios, it is neither practical nor desirable to archive an object graph "literally" (object by object). Usually, you will want to "cull" data from the graph, archive that reduced quantity of information, then rebuild the graph on unarchiving. That's one good reason why the lack of Swift-native archiving isn't currently a show-stopper (for most scenarios).

Rod_Brown · January 31, 2019, 2:44am

My apologies, you’re correct it doesn’t inherit from NSObject, though I think your comments about runtime depend on the platform and other marks as noted in the code below

I did prefix my words with “I believe” because I wasn’t sure, I remembered reading about SwiftObject and got confused by the NSObjectProtocol conformance vs inheritance from the class. Admittedly, it was also years ago.

For reference, the real stuff is here:

github.com

apple/swift/blob/main/stdlib/public/runtime/SwiftObject.h

//===--- SwiftObject.h - Native Swift Object root class ---------*- C++ -*-===//
//
// This source file is part of the Swift.org open source project
//
// Copyright (c) 2014 - 2017 Apple Inc. and the Swift project authors
// Licensed under Apache License v2.0 with Runtime Library Exception
//
// See https://swift.org/LICENSE.txt for license information
// See https://swift.org/CONTRIBUTORS.txt for the list of Swift project authors
//
//===----------------------------------------------------------------------===//
//
// This implements the Objective-C root class that provides basic `id`-
// compatibility and `NSObject` protocol conformance for pure Swift classes.
//
//===----------------------------------------------------------------------===//

#ifndef SWIFT_RUNTIME_SWIFTOBJECT_H
#define SWIFT_RUNTIME_SWIFTOBJECT_H

This file has been truncated. show original

Nevertheless the point stands that switching base class is under a devs control, with no real downside and not something to eschew for any specific reason.

Edit - also another thread about the Obj-C base class. I assume this means that it at least participated in the Obj-C runtime on all Darwin-based platforms?

QuinceyMorris · January 31, 2019, 3:35am

Yes, my comment about the runtime wasn't correct. Rather, pure Swift classes don't have to have the exact semantics of NSObject behavior, except as far as they need to satisfy NSObjectProtocol.

In particular, this means (AFAIK) they can have different implementations of things like allocation, initialization, de-initialization, reference counting, object metadata layout, and so on.

Jeroen537 · January 31, 2019, 7:45am

Thanks @itaiferber for your extensive and illuminating reply, and thanks @QuinceyMorris for your further comments!

I am happy to hear that the feature is somewhere on the horizon. For now I will likely go with NSCoding, as both of you suggest.

I was preparing a reply to address the points raised by @itaiferber. Whilst I am not certain if it is still relevant, knowing that the issue is already being debated, maybe some points are of use, or at least of some interest.

If there is a public discussion on the topic I would like to receive a pointer.

Following are my further thoughts. Feel free to skip.

@itaiferber: you have succeeded in pointing out that the issue is not a simple one, in many respects.

But I would like to explore some ideas to address the problems you mention. If the thing could be done in the right way, it would in my view provide a very useful addition to the language.

Please allow me to briefly go over your arguments, subject to my still limited understanding. This is not meant as a technical post, let alone solution, but as an exploration of ideas.

Swift is a strongly-typed, compiled language. This means that there are strong restrictions on how you can treat objects, and any archived format would need to take care to consider the types of objects being assigned to properties on deserialization. If an archive contains a dictionary where I expect there to be an integer, Swift cannot simply assign the dictionary to an integer property — the size, layout, and semantics of the types are completely different.

Python, being so dynamic, has no problem assigning a dictionary where an integer would normally go, so unpickling is relatively straightforward, and this problem can be largely glossed over.

I can see that Python, knowing no type restrictions, has an easier job here than Swift would have reassembling the stored "object" (spoken loosely here, I also mean struct, enum, ...). This certainly would hold if you were restoring on the Swift level, i.e., in a Swift program as I first suggested (and which now, to me, seems increasingly unlikely to be possible).

At a minimum, you would need to know the type of object to be restored. I am not an expert on Swift's type system - it still surprises me from time to time -, but could you not use type(of:) to determine the type of an object, and find a way to store that information with the data?

Certainly there should be restrictions to what could be restored, depending on the restoring program. In my thinking I was not aiming at storing all the type definitions themselves with the data, just enough information to restore the data to a program that knows of all the types involved. Of course, the issue to prevent restoring to an incorrect or outdated set of type definitions in the receiving application must be addressed and this might be a hairy problem.

Perhaps some hash value of the type definitions could be stored with the data and checked upon restoring. Whether this can be done in a reliable way is unclear to me; such a scheme should be based on some internal representation of the type definition and not on the exact program text, obviously. But whether this is feasible I do not know.

Private types you don't know about are another difficult problem, as you mention. This would only work if the framework holding the type itself would allow to store its object graphs, if any. How to interwork with such a facility would require further study. Some convention, possibly a protocol, would have to be established.

Having said that, for a large number of cases this problem would not arise.

Swift has access restrictions where Python is much more free-spirited about things. While I believe the runtime has the power to ignore some of these access restrictions in assigning properties at runtime, this is not exposed to users because it goes against the spirit of the language: Swift is simply stricter about these things, and has many rules in place to maintain order. Unlike Python objects, Swift objects are not typically "bags of stuff" which you can instantiate and place things into — to construct a Swift object, you must call an initializer and treat the object much more carefully to maintain type safety

I understand that access restrictions are an essential part of Swift, and I would not suggest that they be bypassed.

The problem that Swift objects cannot come to life without calling an initializer could be a show stopper for writing a (de)serializer in Swift itself. How deep this requirement reaches down into the runtime I have no idea. What I thought might be done, is making a "dump" of the [data of the] object at a somewhat lower level, and upon restoring reassembling the object without going through an initializer. But whether this is a realistic route will depend on the internal representation and runtime and I do not know enough about those to make a guess about its feasibility.

Having said that: If a storage scheme were developed at this lower level, not accessible to Swift users, it seems to me that violating access restrictions would not be a real problem. They could be stored and retrieved with the data. I assume that at some level the runtime can access all and everything.

As part of these access restrictions, Swift allows whole types to be private or fileprivate, which has a marked effect on the actual class name — as detailed in the other post, moving a private or fileprivate class around can change its actual class name! pickle also glosses over this issue — if you rename a class or a module, there isn't much support without subclassing pickle.Unpickler and manually renaming the type yourself. If this is a private type owned by another library you don't know about, there isn't much you can do

This is a technical point on which I have little to contribute. It sounds like a complication to me, but not necessarily an unsurmountable one per se. However, it seems to indicate a rather low level at which to perform save/restore. Very much will depend on the actual implementation and internal representation. It could possibly be very complex.

Swift, of course, also has structs to deal with, which have their own potential host of issues. Because structs are not objects, they don't participate in the runtime in the same way. I believe that private and fileprivate structs don't necessarily even have to have their names embedded in binaries or in the runtime depending on how they are used, making them inaccessible. It would be a real shame if this implicit encodability were possible for only classes, since many classes end up containing structs in properties, and so those classes could not participate (unfortunately, this is one of the drawbacks that NSCoding has as well, and requires manually encoding struct properties via various other strategies)

To structs I had not given much thought, and I can see that there may be not enough introspection in the runtime to retrieve their names, for example. Yet, in the source code that information is present and could theoretically be used. (Added: and I find that type(of:) also works for structs and enums. Couldn't that be useful?)

To your summing up:

The issue of security (absent in pickle, as you rightly mention) should be a real concern. But some system of digitally signing the stored version might go a long way to addressing it. At the very least, the originating application could check its own signature when retrieving stored data. Such a simple scheme would already allow many useful usage scenarios. Where important for interworking, certificates might be attached, encryption applied, etc.

Thanks for reading through this long post, and I would be interested in any comments, positive and critical!

beccadax · January 31, 2019, 6:16pm

You can call type(of:) to get a type instance, but you then need to somehow encode and decode that type instance—and if you succeed in doing that, you create a potential security problem where someone could replace the encoded representation of your LogWorkout type with the encoded representation of your ChangePassword type and cause your code to do something you didn't intend for it to do.

Jeroen537 · January 31, 2019, 8:09pm

I see what you mean. The type instance you get from type(of:) carries all the information from the type. Storing that creates a security hole, as you point out.
But you can digitally sign your stored object. Encryption might be added to prevent others to even see the contents of your data.
Which is a good idea anyway, I think. It prevents tampering, and also limits use of the encoded data to trusted partners. If I understand you well, this would prevent the attack.

As an aside: In some cases it would not be necessary to store the full type information, only a reference to it. This works when saving and loading programs agree on the meaning of the reference; a simple but useful case being a program that stores an object being handled to continue working with it at a later date. Think of a graphical editor working with a complex drawing.
Evidently, in this case too digital signature and encryption are important for security.

QuinceyMorris · January 31, 2019, 8:55pm

If we're talking about any kind of general solution (even general within some set of limited scenarios), there a big pile of gotchas to negotiate.

IIUC, you're talking about a kind of "checkpoint/restore" functionality, where objects are strictly represented by their internal states. This isn't true for many common scenarios (e.g. an object with a URL property that references a file). In addition, there is the security question that's already been raised, then a resiliency issue for the data representation, then a versioning issue (for types that change over time), a size issue (much of the data in the object graph is redundant), and a performance issue (NSCoding is already too slow, for example).

Putting all of that to one side, there's still the fact that Coding was (informally) conceived as an improvement of NSCoding, but still adhering to NSCoding's design philosophy — that every data item archived or unarchived can be handled by custom code, using normal language features, with perhaps the addition of convenience methods that eliminate boilerplate whenever possible.

That (almost?) necessarily drives Coding down the path of using Swift's initializer mechanisms, and that immediately runs up against some hard problems.

Jeroen537 · January 31, 2019, 9:43pm

IIUC, you're talking about a kind of "checkpoint/restore" functionality, where objects are strictly represented by their internal states.

Absolutely. This is where my question came from, and it seems hard enough (but useful in itself).

This isn't true for many common scenarios (e.g. an object with a URL property that references a file).

True. In this case, I would consider the URL property in scope for checkpoint/restore, but the [content of the] external file not.

In addition, there is the security question that's already been raised, then a resiliency issue for the data representation, then a versioning issue (for types that change over time), a size issue (much of the data in the object graph is redundant), and a performance issue ( NSCoding is already too slow, for example).

All these issues I recognize and agree may be hard to solve.

NSCoding 's design philosophy — that every data item archived or unarchived can be handled by custom code, using normal language features

Not sure I understand. Does "custom code" refer to the protocol implementations the user needs to provide for encoding/decoding of custom types? In other words, that the encoding process should be transparant to the user and under full user control (as opposed to opaque, if handled by some global function)? If so, that is a difficult question. On the one hand, I could very well live with an opaque process for the purpose of checkpoint/restore. But then, I still might want to control what parts of the graph need not be stored. I can see that here are some difficult design issues, and possibly language issues as well. Is this what you mean?

QuinceyMorris · January 31, 2019, 11:50pm

Yes. In Encodable, for example, you invoke encode(_:forKey:) for every value you want your conforming type to persist. In NSCoding, you invoke encodeObject:forKey: for every value. In both cases, there's a long list of "convenience" methods for special cases, including "primitive" methods for scalar types. In Swift, Coding conformance can be synthesized for you, but that's just the compiler doing what you would write manually.

Nothing is not touched by its own line of code.

Jeroen537 · February 1, 2019, 12:39pm

Nothing is not touched by its own line of code.

Admirable. I did not realize that to be part of the Swift philosophy, but I like it.

Yet... I have this problem. Let me elaborate a bit more, because I think my earlier reply could be misunderstood:

IIUC, you're talking about a kind of "checkpoint/restore" functionality, where objects are strictly represented by their internal states.

Absolutely. This is where my question came from, and it seems hard enough (but useful in itself).

With this I do not mean: save the program state, to be resumed at a later moment. Rather, I mean: save one of my objects, to be restored when needed.

A simple use case may clarify this.

Suppose I want to create a graphic editor. More precisely, a program that lets a user create a wireframe model of, say, a car, or a lamp. The model will be built from points called nodes, connected by straight line segments. Both nodes and line segments have properties of their own, such as color, shape, etc.
There will be a top construct to hold a model, called WireFrame. This will likely be a class. There are also constructs Line and Node used by that class, as well as other supporting custom objects. Line might be a struct, but Node will be a class. Several Lines's could be joined to the same Node, which I model by making it a shared property of those Line's. This is where the reference type character of Node comes into play.
I will also have actions for the user to join lines, and to disconnect lines that are currently joined, for the purpose of editing flexibility. One rationale for joining is that whole subparts of the model could be moved, recolored, etc. with one user action, another to easily deform a model by dragging a single node.
Finally, I want WireFrame to be hierarchical, in the sense that it can be contain subordinate WireFrame's.

A typical WireFrame instance will thus contain a hierarchy of Line's, Node's, and WireFrame's, as well as supporting class or struct instances. In this scenario, it will be a directed, acyclic graph, but not necessarily a tree.

In a typical usage scenario, the user starts creating a wire model. During this, the program goes from state to state. For example, the user could have chosen a color from the palette to recolor any Line upon selection. This color would be part of the state. Likely, there is also undo information. There is also the model being constructed.

Now what I want to allow the user is to save the WireFrame under construction, not the full program state. At a later moment the program (or, by extension, some other program knowing about WireFrame) can be restarted and retrieve the WireFrame exactly as it was left. The color setting and the undo information would be gone.

Such a facility would not only allow the user to improve on the model at a later moment, but also to build a library of WireFrame's to integrate when creating a new model.

I think the usefulness of this facility for the user need not be argued.

Now what about me as a programmer? I have a very simple wish: let me save this WireFrame to disk, so that I later can access and use it. I do not need any control over how this is done. I have currently no need for a custom storage format for interchange with other platforms or to archive for eternity, and I do not wish to spend the time to develop something I don't need. And I do not want to go through the hassle of making all my custom types conform to NSCoding (or Codable, for that matter, in case we do have a tree) if I have no wish to serialize. Nor to any other protocol, if that would demand work from me that exceeds the benefit. I would rather spend my time improving the program for my user.

In other words, I am asking Swift to provide its own storage format to reflect the state of objects held by the runtime, and to allow programmers to store to that format and to retrieve from it.

I do realize the issue of evolution, both of my app and of the Swift language and runtime, but I can take care of the first, if Swift can take care of the second.

In summary: while I heartily applaud the principle of giving the programmer full control of what happens, in my view this need not apply in cases where nobody will ever need to see how intermediate objects look like internally, as long as I have control over what is saved by choosing my objects wisely.

That is the background of my question. I hope this is now clear, and also that there is room in the Swift philosophy to make it possible at some time.

QuinceyMorris · February 4, 2019, 7:15am

There are multiple possible answers to what you wrote, depending on which discussion you want to have.

The most relevant, though, is this I think:

Aside from references, Codable can already do what you described. It's not (as it happens) quite the automated, built-in archiving you had in mind. The automation comes from Codable synthesis, rather than runtime magic, but the effect is typically the same for scenarios like your WireFrame.

That is … aside from references. Swift doesn't have any native mechanism for representing references concretely in an archive. (It can represent parent-child and sibling relationship "positionally" in JSON, etc, but that's not a situation where the reference is a distinct, concrete data item you can see.)

The corollary of this is that Swift doesn't have any mechanism for representing references uniquely. It doesn't matter whether you imagine this defect repaired via Codable or via the runtime, it's still unsolvable right now because of order-of-initialization constraints.

For all practical purposes (I'm claiming), there's no reasonable subset of cases for which an ad-hoc solution would be worth the effort. In any practical object graph, there are going to be "off-graph" references — references to object that don't really belong to the graph, and shouldn't be archived. For example, on iOS you can easily end up with an object graph that has an indirect reference to UIApplication.shared, but you can't archive that.

Even in the case of the file URL I mentioned earlier, the issue is not so much that the file isn't archived because it's separate from the object with the URL, but that the object may also have state information about the file or file contents that isn't valid if the object is archived separately from the file.

Any attempt to archive all the attributes of an object, in order to be able to unarchive it later, is doomed to failure because you "never" want to archive all the attributes. In the rare cases where you do, the object graph is likely simple enough for the task of handling the references manually to be easy to code manually.

For all practical purposes (I'm claiming), custom code is always required.

That's a lot of sweeping generalizations in a row, but I think that's the essence of the reason why the kind of solution you're advocating hasn't been given much attention so far.

Jeroen537 · February 13, 2019, 4:35pm

There are multiple possible answers to what you wrote, depending on which discussion you want to have.

Most of all, a discussion that would lead to the simplest possible way to achieve my stated goal. If possible, through a native Swift feature, if not, through user code, and then as generic as possible.

Like others, I have found that Codable does most of what I am looking for, except for references (and weak var's). These are an essential feature of my use case, unfortunately.

I would like to address two different points here: native support, and, short of that, ways to leverage Codable to solve the issue of referential identity.

Native support:

With this I mean support in the Swift language itself to persist and restore objects (class instances, struct instances, enums, etc.). Basically this would say to Swift:

"Please take this object and package it for me so that I can store it, to be retrieved at a later moment so that I can continue working with it. The format in which you do this is immaterial to me. I do not want to use it in any other way except to have it reloaded when asked. I do not want to have to prepare my data in any way for this to happen. I do realize that not everything can be packaged in this way, and I am not asking for a snapshot of the program. Just package the state of the object for me. I do not expect you to package anything except data that I created myself. In other words, if my object contains a property that is external to my own code (ex.: a UIView) you may refuse to handle it and I won't complain. I am also not asking you to package objects that contains closures, etc.; just pure data. However, the object I ask you to save might contain a complex graph of various data carrying objects of my own making and I do ask that you respect that."

I have given some thought to how this might be implemented, and I think it will not be possible in Swift itself, in the way Python does it with the pickle module. This module is written in Python itself. Python has more introspection and less access restrictions than Swift, as @itaiferber has pointed out.

If Swift cannot emulate the Python way to do it in user space, ideally I am looking for the addition of a pair of package/unpackage functions to the Swift language. Such functions would have to be implemented by the language developers, not by me as a user. Having said that, I would be surprised if there would be fundamental obstacles to realize them, given that all the information needed must be already present in the runtime. And personally I do not think it against the spirit of Swift, although I am not the one to make that call, of course.

Anyway, since the native solution is not now available, I would next like to address what I think can be done in Swift at the moment.

Leveraging Codable

I have developed a simple prototype that leverages Codable to restore object identity in many cases. It rests on a few ideas:

Create a custom encoder.encode function for class instances to store a unique identifier with every class instance. I use the ObjectIdentifier.debugDescription string for this. Although not documented, for the time being I assume that its value is unique within the program execution instance. (I would use ObjectIdentifier itself, except that it is not Codable.)
At the point of decoding, first retrieve the instance's stored identifier and check it against a dictionary of [String : AnyObject]. If the identifier is present as a key, the associated object will be used. This object is an AnyObject and will have to be downcast, but that is not a problem since the target type is known at that point in the code. If the identifier is not present as a key, decode the value, use the result, and update the dictionary.
For this to work, the dictionary mentioned must be read/write accessible at every level in the decoding process. I put this in the decoder.userInfo property (wrapped in a class instance).

Discussion:

This code works OK for me, but has a few problems in my opinion.

In the first place, it requires me to write quite a bit of boilerplate code: for each and every class type within scope (for encoding), and for every decoding of such a class instance in the code (for decoding). This leads back to my wish for native support. Short of implementing native support, such code could perhaps be synthesized on the basis of some configuration property to be set on the decoder instance, in case object references are to be restored.

In principle, I do not like to go through Codable here. In my view, Codable is best used as a protocol to encode/decode values to a medium for external access, under strong user control to accomodate for differences in the way other parties might encode/decode to the same medium.

Very well. But for my purpose I do not have that need and I wish it would be not be necessary to go through all the motions. Plus that encoding/decoding will usually be much slower than storage in a native format, I think.

And it has other restrictions that restrict the type of objects I can store. I found that having weak var's inside my object interferes with encoding/decoding as I want it. It seems that encoding does handle the weak var's , but decoding does not restore them. I am not sure I understand why this is so. Anyway,, it currently prevents me from storing objects containing weak reference cycles. Is there any way to force decoding of weak var's ?

Complications like these are another reason to wish for a native solution.

The [String : AnyObject] dictionary used to determine class instance identity can also be used to make decoding more efficient. Decoding uses it to never decode a class instance more than once. This is done by checking its identifier against the dictionary before deciding whether to decode, as explained above. If the key is present, the existing instance is used directly.

Encoding might use similar techniques for efficiency, but seems more difficult to add. It now creates a full object tree, with duplication of shared objects.

I have not extensively tested. So far it works for me, but my test case is small and probably does not exhaust all possible scenarios. Are there any obvious pitfalls? Of course, all types involved must be Codable, but in this scenario that is not a bad thing, since the compiler forces me to address all types within the object, making sure they are indeed suitable for storage.

itaiferber · February 13, 2019, 5:54pm

Unfortunately, I don't have time at the moment to respond to all of your points here, but I will note that we do eventually intend for encoders like JSONEncoder and PropertyListEncoder to support reference semantics, and this is mostly the scheme that they would use.

The intention is really not to force you to write this type of boilerplate yourself.

Jeroen537 · February 15, 2019, 9:53am

The intention is really not to force you to write this type of boilerplate yourself.

Thanks, good to hear that!

I have read through much of the "Codable with reference?" thread, which seems closely related to this one. There is a lenghty discussion between yourself (@itaiferber) and @QuinceyMorris on how to implement reference semantics with Codable, which I found very interesting.

The briefest conclusion from that for me would be "It is not obvious, nor easy to do". At any rate, an implementation would be rather complex, and probably not yet completely satisfactory.

It is my experience that at such points in a discussion it may be useful to take a step back and consider the What and the Why, before going back to the How.

This is what I propose to do in this post. Anyone with me?

The What

The What asks for a definition of what a proposed scheme exactly should make possible. In my view, at this point of the discussion, the question would be: What exactly must be storable using the scheme, and: What is expected from the developer to use it?

My own answer here goes along the lines of: I want to store and retrieve what I call "data carrying objects", without me having to write custom code for it.

What I mean by "data carrying object" I have tried to explain, by example, in an earlier post. Making this notion technically precise must be done and is an interesting challenge, but is not impossible I think.

What I mean by "store and retrieve" I also tried to explain. In its barest form, access to the stored item would be restricted to the program itself, the item never leaving the Swift world. This is very different in intention from data interchange. Of course, if the latter is also possible: so much the better.

Wat I mean by "no custom code" I think should be clear. If the scheme would involve adhering to some protocol, no problem to declare conformance on my custom types, but I would not want to have to write code to make conformance happen. Either the scheme can accommodate an object without custom code, or not. If the range of objects that can so be accommodated encompasses all data carrying objects as defined, all is well.

This is my provisional definition of "What". I would be interested to know if there is a explicit and commonly agreed definition present in the Swift world, and how that is defined.

The Why

The Why explains the reason the feature benefits users.

This begs the question: why discuss the Why, if the What has already been defined? For two reasons, I think:

To help decide if the What is worth the implementation effort. If the Why regards only a few users, it is very different from when a majority of users would benefit from it.
To help tailor the What to its most important core, in case implementation of the full What is impossible, or undesirable for some reason, for example having to do with the stated principles for the language and its evolution.

My own definition of the Why is: Like most developers, I am structuring my application as an MVC. In MVC terms, I want to store and load my Models without hassle.

Why? In all but the most trivial of programs there is a model. Be it a game situation, a managed inventory of CD's, or the state of a simulation, there is almost always the need to store it for later reference or to continue working with it. Having to put work, and thought, into making that possible, detracts from my real job, which is developing the model and the user's and/or environment's interaction with it.

Note that, for this scenario, it is not part of the Why that I need to share the stored object with other applications. That is why that is also not part of the What for me.

I would be interested to learn if others share this What and Why, and if not, what are your thoughts?

jrose · February 15, 2019, 5:34pm

I don't really want to jump in so much—I haven't been following closely enough for that—but any sort of "save everything so you can restore everything" solution falls down as soon as you release a new version of your app. I appreciate that init(from:) and init(coder:) both allow you to be explicit in how to handle things that might have changed from your previous app. So even if there's a solution that doesn't require writing custom code, it can't be a solution that depends on not writing custom code, such as using reflection exclusively.