[Pitch] Nullable and Non-Optional Nullable Types

Feature: Nullable and Non-Optional Nullable Types

Overview

Swift currently provides an optional type system to handle values that may be absent. However, there are cases where a non-optional value may still be null at runtime, leading to potential bugs and crashes. This feature introduces two new type annotations, Nullable and NonOptionalNullable, that provide a way to express nullability for non-optional values.

Goals

The goals of this feature are to:

  • Provide a clear and explicit way to handle null values for non-optional types in Swift.
  • Reduce the potential for null-related bugs and crashes in Swift code.
  • Integrate with existing Swift type system and syntax in a natural and intuitive way.

Design

Type Annotations

The two new type annotations, Nullable and NonOptionalNullable, are parameterized types that can be used to annotate variables, properties, and function parameters and return types.

  • Nullable<T>: A type that can hold either a value of type T or a nil value.
  • NonOptionalNullable<T>: A type that can hold either a value of type T or a nil value, but not a default value of T.

For example, a variable of type Nullable<Int> can hold either an integer value or a nil value, while a variable of type NonOptionalNullable<String> can hold either a non-empty string or a nil value.

Type Inference and Overload Resolution

The Swift type checker will be modified to allow Nullable and NonOptionalNullable types to be used in place of their non-nullable counterparts wherever they are expected. Type inference and overload resolution will be updated to support these types.

  • T? will continue to represent an optional T.
  • T! will continue to represent a non-optional T.
  • Nullable<T> will be used to represent a non-optional value that can be null at runtime.
  • NonOptionalNullable<T> will be used to represent a non-optional value that can be null at runtime.

Syntax

The syntax for Nullable and NonOptionalNullable types will be similar to the existing syntax for optional types.

  • T? will continue to represent an optional T.
  • T! will continue to represent a non-optional T.
  • Nullable<T> will be written as T? or Nullable<T>.
  • NonOptionalNullable<T> will be written as T! or NonOptionalNullable<T>.

Runtime

The Swift runtime will be updated to support Nullable and NonOptionalNullable types.

  • A separate bit will be allocated to track whether a value is nil or not for each nullable and non-optional nullable type.
  • Memory layout and alignment will be adjusted to accommodate the null tracking bit.
  • Null-checking logic will be added to the generated code for any operations that may read or write nullable values.
  • Conversion between nullable and non-nullable types will be supported.

Pseudocode:

func foo(x: Nullable<Int>) {
    if let value = x {
        print("x has value: \(value)")
    } else {
        print("x is nil")
    }
}

let a: Nullable<Int> = 1
foo(x: a) // Prints "x has value: 1"

I don't understand the difference between nullable and optional. For most structs, this proposed Nullable type seems to behave identically to Optional, though Optional has a smaller memory footprint for classes, unsafe pointers, and most enums. (There also seems to be a special case that String and Optional<String> are the same size, and I don't fully understand why.)

I'm especially confused by NonOptionalNullable. What exactly does "but not a default value" mean?

7 Likes

Thank you for reading.

I apologize if my explanation was not clear enough. I will try to clarify my idea. I hope it will be helpful.

Currently, Swift's optional type system provides a way to represent values that may be absent, but it doesn't have a way to express the idea of a non-optional value that could still be null at runtime. This can lead to bugs where developers forget to check for null values.

That is why I humbly suggest a new type annotation that indicates that a non-optional value may be null at runtime.

Unless you’re bridging from another language and the bridged API is incorrect (as we see in Apple’s frameworks occasionally), how can any Swift value that isn’t an optional be null?

10 Likes

Yeah, values imported from C APIs or unsafe APIs are the only ways I can think of that this can happen

class C {}

let validObjectReference = C()
print(validObjectReference)

let nullReference = unsafeBitCast(nil as UnsafeRawPointer?, to: C.self)
print(nullReference) // Boom

Hence the unsafe in unsafeBitCast.

I believe you're misconstruing "null" to mean "empty". An empty String isn't considered null in Swift, it's just a string that happens to be empty. Similarly for Int, 0 is not considered null. In general, primitive values in Swift don't have "defaults" like other languages do.

If you're looking for a reliable way of representing non-empty strings and arrays at runtime, I suggest you check out this library: GitHub - pointfreeco/swift-nonempty: 🎁 A compile-time guarantee that a collection contains a value.

3 Likes

i do understand the idea here, and it is useful in serialization/encoding tasks to be able to elide values that are "empty" (but non-nil). to avoid confusion with Optional, i prefer to refer to this concept as Elidable.

i disagree with the proposed direction, to model this a generic Elidable<T> type. Elidable should be a protocol instead.

protocol Elidable
{
    init()
    var isEmpty:Bool { get }
}

i think this could be very useful, surely more useful than Identifiable. (how often do you declare that conformance but never really use it?)

1 Like

Pretty much every SwiftUI view relies heavily on Identifiable—that's why it was added, no?
It's often a good basis for Hashable as well.

1 Like

that’s fair, my experience is with more of the linux ecosystem, where there are fewer large frameworks that use Identifiable a lot.

I'm still not sure I get it, can you give an example?

When deserialising, for example, JSON, I've often wanted a way to distinguish between 'this field was explicitly NULL in the JSON' and 'this field was not present in the JSON' but I don't think this is the same thing, and could probably be handled with nested optionals.

Nulls are not preserved with JSONEncoder / JSONDecoder, but you can use JSONSerialization:

let string = #"{"foo":null}"#
let obj = try! JSONSerialization.jsonObject(with: string.data(using: .ascii)!) as! [String: Any]
let data = try! JSONSerialization.data(withJSONObject: obj)
let string2 = String(data: data, encoding: .ascii)!
print(obj) // ["foo": <null>]
print(string2) // {"foo":null}

if let value = obj["foo"] {
    if value is NSNull {
        print("nil") // prints: nil
    } else {
        print("non nil: \(value)")
    }
} else {
    print("absent")
}

when encoding, if an array field is empty, you often want to elide the field entirely, rather than encode an empty array.

1 Like
I don't want to stray too far from the thread, but:

This isn't quite true. The Codable APIs consider null values first-class citizens, and do offer support for reading and writing null values directly:

  1. KeyedEncodingContainer.encodeNil(forKey:), UnkeyedEncodingContainer.encodeNil(), and SingleValueEncodingContainer.encodeNil() can all be used to explicitly write out a null value into an archive
  2. KeyedDecodingContainer.contains(_:)/.decodeNil(forKey:), UnkeyedDecodingContainer.decodeNil(), and SingleValueDecodingContainer.decodeNil() can all be used to explicitly check for the presence of a value, or a null value

However, Codable synthesis in the compiler opts to use the more permissive encodeIfPresent(_:forKey:) and decodeIfPresent(_:forKey:) variants as defaults, which return nil if either the key isn't present, or the value is null. These are effectively shorthands for

if let value {
    try container.encode(value, forKey: ...)
} else {
    try container.encodeNil(forKey: ...)
}

and

if container.contains(...) {
    try container.decode(..., forKey: ...)
} else {
    try container.decodeNil(forKey: ...)
}

If you do care about the specifics of these cases, you can distinguish them by implementing Encodable and Decodable manually to check precisely the fields you need.

4 Likes
Good to know, thanks.

Note that often times it's not convenient to do a custom encoding/decoding as you can't do it specifically for the field in question, you have to drop away all good system already did for you and start over implementing it for all fields.

I believe it could be a built-in "nil coding" strategy.

Just for encoder it's trivial, and for both encoder and decoder it could be achieved with either something like TriStateOptional or simply:

struct ExampleStruct {
    var foo: Int??
}
// nil - the field is absent in JSON
// Optional(nil) - the field is present in JSON, but nil
// Optional(Optional(42)) - the field is present in JSON and not nil
straying even further :)

Hmmm, I though this could be done with property wrappers, but I couldn't figure it out.

import Foundation

@propertyWrapper struct PotentiallyMissing<T> {
	var wrappedValue: T??
	
	init(wrappedValue: T??) { self.wrappedValue = wrappedValue }
}

extension PotentiallyMissing: Decodable where T: Decodable {
	init(from decoder: Decoder) throws {
		let container = try decoder.singleValueContainer()
		
		if let value = try? container.decode(T.self) {
			self.init(wrappedValue: value)
		} else {
			let isExplicitlyNull = container.decodeNil()
			self.init(wrappedValue: isExplicitlyNull ? .some(nil) : nil)
		}
	}
}

extension PotentiallyMissing: Encodable where T: Codable {
	func encode(to encoder: Encoder) throws {
		switch wrappedValue {
		case .some(.some(let value)):
			var container = encoder.singleValueContainer()
			try container.encode(value)
		case .some(nil):
			var container = encoder.singleValueContainer()
			try container.encodeNil()
		case nil:
			break // no-op, don't even encode a key with null value
		}
	}
}

struct S: Codable {
	@PotentiallyMissing var i: Int??
}

let hasValue = S(i: 123 as Int??)
let hasExplicitNil = S(i: .some(nil))
let hasNoKeyAtAll = S(i: nil)

let encoder = JSONEncoder()
print(String(decoding: try encoder.encode(hasValue), as: UTF8.self))
print(String(decoding: try encoder.encode(hasExplicitNil), as: UTF8.self))
print(String(decoding: try encoder.encode(hasNoKeyAtAll), as: UTF8.self)) // Not what I expcted: {"i":{}}

let decoder = JSONDecoder()
print(try decoder.decode(S.self, from: #"{ "i": 123 }"#.data(using: .utf8)!))
print(try decoder.decode(S.self, from: #"{ "i": null }"#.data(using: .utf8)!))
print(try decoder.decode(S.self, from: #"{ }"#.data(using: .utf8)!)) // Boom
Straying further still, but fixing it for you

@AlexanderM You can make this to work by introducing the following extensions that hook into the type's Codable synthesis:

extension KeyedDecodingContainer {
    func decode<T: Decodable>(_: PotentiallyMissing<T>.Type, forKey key: Key) throws -> PotentiallyMissing<T> {
        guard contains(key) else { return PotentiallyMissing(wrappedValue: nil) }
        guard try !decodeNil(forKey: key) else { return PotentiallyMissing(wrappedValue: .some(nil)) }
        return PotentiallyMissing(wrappedValue: try decode(T.self, forKey: key))
    }
}

extension KeyedEncodingContainer {
    mutating func encode<T: Encodable>(_ value: PotentiallyMissing<T>, forKey key: Key) throws {
        guard let optional = value.wrappedValue else { return }
        if let value = optional {
            try encode(value, forKey: key)
        } else {
            try encodeNil(forKey: key)
        }
    }
}

…only the latter fails with error: ambiguous use of 'encode(_:forKey:)' unless you also loosen the conditional Encodable conformance requirement to what it probably should've been to start with:

-extension PotentiallyMissing: Encodable where T: Codable {
+extension PotentiallyMissing: Encodable where T: Encodable {

With these changes, the program prints what you'd expect:

{"i":123}
{"i":null}
{}
S(_i: main.PotentiallyMissing<Swift.Int>(wrappedValue: Optional(Optional(123))))
S(_i: main.PotentiallyMissing<Swift.Int>(wrappedValue: Optional(nil)))
S(_i: main.PotentiallyMissing<Swift.Int>(wrappedValue: nil))
2 Likes
One last (🚨 important 🚨) note:

The reason a property wrapper can't directly handle this case is because the check for the presence of a key is done one level up in the containing type, and when you wrap a variable in a property wrapper, it's the type of the property wrapper that gets encoded and decoded.

In the case of S.i, the actual type being encoded and decoded is PotentiallyMissing<Int>, not the Int?? wrapped value. This means that when the compiler synthesizes Encodable and Decodable conformances, the calls being made are encode(_:forKey:) and decode(_:forKey:), not the IfPresent variants, because the type isn't actually Optional.

This means that struct S actually looks like:

struct S: Codable {
    @PotentiallyMissing var i: Int??

    private enum CodingKeys: String, CodingKey {
        case i
    }

    init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: CodingKeys.self)
        _i = try container.decode(PotentiallyMissing<Int>.self, forKey: .i)
    }

    func encode(to encoder: Encoder) throws {
        var container = try encoder.container(keyedBy: CodingKeys.self)
        try container.encode(_i, forKey: .i)
    }
}

If you want additional justification for why this behavior is necessary, consider:

  1. If the compiler encoded and decoded the wrapped value instead of the property wrapper itself, the property wrapper's own init(from:) and encode(to:) would never get called, preventing the possibility of using a property wrapper from customizing Codable conformance in the first place
  2. Even if a property wrapper wraps an Optional value, the property wrapper itself is not optional — if you were to decodeIfPresent(PotentiallyMissing<Int>.self, forKey: .i) and got back nil, what value could be reasonably assigned to _i? (The compiler could theoretically know to try to assign PotentiallyMissing(wrappedValue: nil), but this gets really subtle and tricky)

With this in mind, you can see that:

  1. On decode, the field is asserted to be present (even if nil): the decode call checks for the .i key before ever calling into PotentiallyMissing<T>.init(from:), so an error is thrown before any decoding actually happens
  2. On encode, the field is also asserted to be present (even if nil), which means that if nothing at all is encoded by PotentiallyMissing<T>.encode(to:), the key still needs to be written out, and so an empty object is inserted to allow the key to be present

The reason @pyrtsa's suggestion works is that it interjects an overload for PotentiallyMissing<T> one level up at the actual encode/decode callsites, and does the work there, so that you can map a missing key to an actual PotentiallyMissing<T> value and vice versa.

:warning: HOWEVER: :warning:

You must be extremely careful with this approach — both in using it, and suggesting it to others. These method additions are statically dispatched, and so will only work for PotentiallyMissing<T> within the modules where these overloads are visible (and cannot work "retroactively"). In other words, they're likely to only work within your module.

Consider:

  1. Module A, provides PotentiallyMissing<T>
  2. Module B imports Module A, and has a type which encodes and decodes using PotentiallyMissing<T>
  3. Module C (your module) imports Module B, and also has a type which encodes and decodes using PotentiallyMissing<T>, and adds the listed overloads

Code from Module B will encode PotentiallyMissing<T> values one way, and your module will encode those same values another way! This is extraordinarily subtle and fragile, and can lead to inconsistent results, even within the same type hierarchy.

Use at your own risk.

3 Likes
Clarification

Does the issue still remain if PotentiallyMissing<T> and the listed overloads are all made public and defined in A?

I can't see how that would make B and C behave differently anymore.

That's correct. Static overloads can be used from any module where those overloads are visible — if they are exposed publicly from Module A, and modules B and C are compiled with those overloads present, they can be used.

However, the scenario I posed is far from hypothetical:

  1. Module A is the stdlib, and the type in question is an stdlib type
  2. Module B is a package you use, which uses the stdlib
  3. Module C is your app, which uses both the stdlib and that package

If you're not careful, one archive can contain multiple subtly- (or not so subtly-!) different representations of the same type.

1 Like