“Three way optionals” / Distinguishing unknown and absent values

kibigo · March 7, 2021, 10:33pm

In some cases, it is important to distinguish between “unknown value” (because it has not been allocated; because it is not determinable from the given information) and “absent” (known not to be present), two cases which ordinarily would be treated as nil using Swift optionals.

One option would be to use a double‐optional (Optional<Optional<A>>), where the outer optional represents known or unknown and the inner represents present or absent.

Another would be to effectively re‐implement Optional-like logic in a three‐valued context on a custom enum; i.e.

enum PossiblyUnknownOrAbsentValue <Wrapped> {

  /// It is not known whether the value is present, and if so, what its value might be.
  case unknown

  /// The value is known not to exist.
  case absent

  /// The value is known to exist and have the given value.
  case value (Wrapped)

}

(Yet, it seems less‐than‐ideal to me for such a fundamental data structure to be implemented [and potentially re‐implemented] via module code.)

I am curious whether the Swift community has encountered situations like this before, and whether there are existing conventions regarding representing these kinds of values.

Anachron · March 7, 2021, 11:05pm

I haven't met that particular problem before, but I'd definitely be in favour of a dedicated enum, because how would you interpret Optional<Optional<Optional<A>>>> then? Also, nested optionals can be collapsed using flatMap{$0} - or even better: don't even let nested Optionals arise by using flatMap in whichever computation gives you a nested optional. Your enum is definitely better suited for this situation.

Maybe also consider Result<Success, Failure> (which ships with Swift) with a dedicated enum for failure:

enum DataError : Error {
   case absent
   case unknown
}

kibigo · March 8, 2021, 3:48am

Allow me to give a concrete example, which may better illustrate the problem. Suppose we are cataloguing :Books, which may be either physical or digital, and we are interested in the size of these books in physical or digital space. For physical books, we might use a :thickness property, to give the thickness of the book as it sits on a shelf. For digital books, we might use a :fileSize property to indicate their size in bytes, as they might be stored in some medium.

A :Book (as we are defining it) may be either physical or digital, but not both, so :thickness and :fileSize are disjoint properties. So suppose we are provided the following information:

<#BookA> a :Book ;
    dc:title "My Book" ;
    :fileSize 1024 .

Because <#BookA> has a :fileSize, we know it is digital. We can consequently conclude that it does not have a :thickness; i.e. :thickness is absent. Given that a :Book may either have a :thickness or not, we might represent this in Swift as a Optional<Thickness> value.

/// A book which is either digital or physical.
struct DigitalOrPhysicalBook {

  /// Digital filesize in bytes; `nil` if physical.
  var fileSize: Int?

  /// Physical book thickness; `nil` if digital.
  var thickness: Thickness?

  /// Whether this is a digital book.
  var isDigital: Bool
  { fileSize != nil }

}

Now suppose we are presented with the following information:

<#BookB> a :Book ;
  dc:title "My Other Book" .

With this information, we cannot conclude whether <#BookB> is physical, and consequently has a :thickness, or not. So the :thickness of <#BookB> is unknown. This is clearly a different situation from the :thickness being absent, which implies a digital book.

Given that we have already decided on Optional<Thickness> as a way of presenting the thickness of books, hopefully this makes sense as to where Optional<Optional<Thickness>> would come from.

/// A book, physicality not necessarily known.
struct Book {

  /// The file size of the book; `nil` if unknown, `.some(nil)` if not digital.
  var fileSize: Optional<Int?>

  /// The thickness of the book; `nil` if unknown, `.some(nil)` if not physical.
  var thickness: Optional<Thickness?>

  /// Whether this book is digital; `nil` if unknown.
  var isDigital: Bool?
  { fileSize.map { $0 != nil } }

}

(To answer your question, Optional<Optional<Optional<Thickness>>> would be read as “it is possibly not known whether it is known whether :thickness is present or absent” under this scheme; i.e. :thickness is possibly unprovable [??], which is not a situation that I would ever want to deal with, but uh. Who can say .)

The point: Being able to distinguish between whether something is known and whether something is present is not a problem typically encountered in “closed‐world” programming domains (where datasets can be assumed to be complete), but it is an absolutely essential distinction in “open‐world” domains, to prevent from implying incorrect assumptions based on incomplete data. Hence this question!

I agree that Optional<Optional<A>> is rather opaque, which is why I am fishing for alternatives :P . Result<Success, Failure> is an interesting approach and not one I had considered; although the semantics of treating the absence of a value as an Error rubs me the wrong way, it has behaviour similar to what I’m looking for :P .

ExFalsoQuodlibet · March 8, 2021, 7:10am

I'm neutral towards the idea of .unknown cases, but your example doesn't seem a good one to me, because the correct tool to represent alternative options for a data structure in Swift is using enums.

A better representation of Book in Swift would be:

/// A book, physicality not necessarily known.
struct Book {  
  enum Category {
    case digital(fileSize: Int)
    case physical(thickness: Thickness)
  }
  
  /// Which kind of book we're dealing with: `nil` if unknown.
  var category: Category?

  /// Whether this book is digital; `nil` if unknown.
  var isDigital: Bool? { 
    switch category {
      case nil:
        return nil
      
      case .digital?:
        return true
      
      case .physical:
        return false
    }
  }
}

In this case, nil for category means unknown.

Mordil · March 8, 2021, 7:26am

A usecase for this situation is found commonly with JSON.

nil could mean a property on an object is missing, or it could be a null value.

ExFalsoQuodlibet · March 8, 2021, 7:33am

That's a specific "feature" of how json represents data. But what's the practical difference when modeling that object in Swift? In what way that information (missing property vs. property = null) could be used?

hisekaldma · March 8, 2021, 12:07pm

Whenever I run into these situations, I try to use a generic enum similar to Optional, but specific for the states that I want to represent. For example:

enum Knowable<T> {
  case known(T)
  case unknown
}

This makes it clear what Knowable<Int?> represents, and that .known(nil) is a known missing value, whereas .unknown is a completely unknown value. Compare that to Int?? where it isn’t immediately clear whether it’s .some(nil) or just nil that represents an unknown value.

I’ve also found that developers who don’t come from a functional background can get frustrated by the concept of double optionals and .some(nil) – sometimes to the point of "but that doesn’t really mean anything!" Giving the cases domain-specific names like .known(nil) both clarifies what each level of optionality means and documents why the whole thing can’t just be collapsed to a simple Optional. (And if you can’t find good names for each level of optionality, that might be a good sign that the whole thing could in fact be collapsed....)

Tino · March 8, 2021, 12:23pm

Looking at Result: Especially right now where async is the big topic, it might be a good time to talk about cancellation as third option.

erikstrottmann · March 8, 2021, 6:41pm

I’ve used JSON APIs that treat missing (undefined) and explicitly null fields differently. For example, in a request that modifies an object, a missing field would mean “don’t change the value” and null would mean “set the optional field to null”.

The Apollo iOS GraphQL client generates Swift code that matches a GraphQL schema. It generates Optional<T?> for nullable fields for exactly this reason.

kibigo · March 8, 2021, 10:15pm

The problem is that this assumes that the data structure, and all forms it might take, is known ahead‐of‐time :P . While this works in my simple example with only two, known cases, it is easy to add complexity to the dataset to the point where this becomes untenable (to say nothing of reasoning about data structures which are not known at compile‐time).

While I’m personally not dealing with JSON, it is relevant to the conversation in the sense that people who have to interface with JSON APIs often do not know the exact structure of the dataset they will be receiving at compile‐time, or else will have to parse potentially‐incomplete datasets without failure.

I’ve been leaning towards this solution myself; my major concerns are just:

Portability across API boundaries (enums are cumbersome here), and
Certain Swift behaviours which are only available to Optionals (i.e., I don’t think there would be a way to express an unowned Knowable the way that you can with Optionals, as they are value types).

It still might be the best solution, though.

John_McCall · March 8, 2021, 10:17pm

The idea of an Optional-like type that can carry other information besides "not present" is definitely useful. The question is whether it is general, or at least general enough for inclusion in the standard library. I think the answer is clearly "no", and you should instead use an enum that precisely encodes the possibilities in your specific situation.

ExFalsoQuodlibet · March 9, 2021, 7:37am

That seems the dual problem, that is, communicating intent with a data structure (instead of modeling data): I agree that an optional in itself doesn't cut for this use case, but neither a completely general unknown case. In the example you're referring to I'd use a generic type that better conveys the intent, like (this is something I actually use in production code):

enum Update<A> {
  case unchanged
  case set(A)
}

In your specific example, A would be an Optional. The fact that those JSON APIs infer a particular meaning the structure of the JSON object is an implementation detail of the APIs themselves: JSON is just a particular serialization strategy and, in the case of those APIs, the Update value would be translated accordingly.

A comment in the thread says:

In future codegen we'll be working with a custom enum that makes this clearer, but for what we've got now, the double-optional is the best way to represent it.

That's the point. For representing that use case (a dual case of domain modeling) the best solution is a custom enum that suits that domain-specific logic.

The distinction here is in how you model a domain entity vs how you model the server output data: I agree that the latter could be anything, and if you (like everyone, really) use JSON, you unfortunately must the pay the price of the limited power of JSON to model any complex data structure.

For example, drawing from your example:

struct Book {
  var name: String
  var category: Category

  enum Category {
    case physical(thickness: Thickness)
    case digital(fileSize: Int)
  }
}

This is, to me, the correct way to model such domain entity. But when getting a book from the server, you'll likely going to have a flat object with optional fields, and maybe a field that represents which case of the enum you're dealing with, for example:

{
  "name": String,
  "category": physical|digital
  "thickness": Number?
  "fileSize": Number?
}

To represent this with a Swift type, you could use something like the following:

struct RawBook: Decodable {
  var name: String
  var category: String
  var thickness: Double?
  var fileSize: Int?
}

This is just a raw representation, and because it models a JSON, it's going to be a flat data structure with optionals. When decoding this, the fact that, for example, thickness is null or is absent is irrelevant.

You would then have, maybe, an initializer on Book that takes a RawBook, like the following:

extension Book {
  init(raw: RawBook) throws {
    /// Here you can `switch` on `raw.category`, and `throw` if the category is unrecognized or the non-null properties don't match their category, or maybe use a "sensible" default 
  }
}

In this case, a "3-way optional" wouldn't be useful because, for the domain entity it makes no sense, and for the raw representation (that again is a consequence of how JSON works) it doesn't matter.

Definitely agree: the presented examples, up to this point, don't suggest, to me, the need for a standard library type.

hisekaldma · March 9, 2021, 9:49am

unowned or weak don’t work with this type, that’s true. If you need either of those you can box the reference in an Unowned<T> or Weak<T> struct. Doable, but a bit clunky.

Some other things I’ve found useful to make this approach less painful are:

1. Add conditional conformances for the `ExpressibleBy` protocols

extension Knowable: ExpressibleByFloatLiteral where T: ExpressibleByFloatLiteral {
    init(floatLiteral: T.FloatLiteralType) {
        self = .known(T(floatLiteral: floatLiteral))
    }
}

extension Knowable: ExpressibleByIntegerLiteral where T: ExpressibleByIntegerLiteral {
    init(integerLiteral: T.IntegerLiteralType) {
        self = .known(T(integerLiteral: integerLiteral))
    }
}

etc

This lets you write let value: Knowable<Int> = 2, which can help improve readability immensely in code that sets many of these values (e.g. tests or SwiftUI previews).

2. Add a getter for the value-case

extension Knowable {
    var known: T? {
        switch self {
        case .known(let value): return value
        case .unknown: return nil
        }
    }
}

This lets you easily convert to an optional when you really do want one, e.g. to use with if let or when interacting with APIs that take optionals.

3. Add map

extension Knowable {
    func map<U>(_ transform: (T) throws -> U) rethrows -> Knowable<U> {
        switch self {
        case .known(let value): return .known(try transform(value))
        case .unknown: return .unknown
        }
    }
}

I always end up needing it, anyway.

viktorcode · March 11, 2021, 6:09pm

In my personal experience writing the logic around vars that can be unknown at the execution time is a way of complicating the code. I found it useful to put an effort separating the code into states in each of which everything is known at any given time. Swift have beautiful facilities to help with that. Of course, your experience may vary, but I think that encouraging 'unknown' values would lead to bad coding practice.

Another thing is that to me the idea of handling 'unknown' values in the language seems uncomfortably too close to the uninitialised var access, like in C++, etc. Swift of course will prevent that, so it doesn't really apply, but I can't get rid myself of that impression.

“Three way optionals” / Distinguishing unknown and absent values

1. Add conditional conformances for the ExpressibleBy protocols

2. Add a getter for the value-case

3. Add map

1. Add conditional conformances for the `ExpressibleBy` protocols