On purity grades of value types [was: ValueType protocol]

Nickolas_Pohilets · September 19, 2021, 12:55pm

I think the very notion of value semantics is inseparable from equality. Even “equivalence”, because currently Equatable in Swift is a pure-mans proxy solution for that. Ideally it should be an inherent attribute of all types, including functions and weak references.

This probably belongs to ValueSemantic protocol, but I’m too lazy to go over entire discussion (currently on 20/126). I’ll try to digest it and join the discussion later. Meanwhile let me comment here.

I think I can provide a formal theoretical definition for value semantics:

Given expression E, let {x1, x2, … , xn} be a set of variables touched by expression E of types {T1, T2, …, Tn}. This includes let’s and var’s which are lexically used in E, but also all global variables and constants touched by a transitive closure of functions (including initializers, getters, setters, subscripts and deinitializers) callable from E. Including global variables outside Swift code. For example, behind Date.init() there is some hidden global variable which stores current time. Such variable might even reside in external hardware.

Let’s consider evaluating E twice - first when variables have values {v1, v2, … , vn} it evaluates to ve, and second time when variables have values {u1, u2, … , un} it evaluates to ue.

All types in set of types S have value semantics, if “equivalence” of all pairs vi <-> ui, implies equivalence of ve <-> ue for all E touching only variables of {Ti} ⊆ S.

Type Tk does not have value semantics if there exists expression E, touching variables of types {T1, T2, …, Tn}, where all types {Ti | i != k} have value semantics, vi is equivalent to ui for all i, but ve is not equivalent to ue.

If we have an expression E where types of touched variables include type Tk of unknown semantics, and type Tj known to have reference semantics, we cannot say anything about semantics of Tk based on E.

Based on this definition:

Date has value semantics. If you can somehow control global variable holding current time, Date.init() will give you perfectly reproducible results. Similar for Int.random().
Array has reference semantics, because of Array.capacity. But for the practical purposes, I would mark it with some attribute, so that “remaining Array without capacity property” still has value semantics.
Classes have value or reference semantics, depending on how you define equivalence on them. If you compare them only by address, then classes have reference semantics. If you compare them by address and values of all the stored properties, they become value semantics, because now values of properties become part of the value of the class type and part of the corresponding vi/ui. If you compare them only by values of properties, but ignore the address - you still have reference semantics, because, you can get two different ObjectIdentifiers out of two equivalent values.
If you add extra variable for entire memory (or even entire hardware state), every type has value semantics.
If you don’t add such variable, but have an API which allows to read hardware state based on Int, all types having at least two values (so, except Void and Never) have reference semantics.

Also, I suspect that transitive closure of callable functions is not computable. Even if we disregard control flow because of halting problem and do conservative analysis, still because of closures, protocols and subclassing, such set would include almost every function.

Transitive closure of callable functions is needed only for global variables. So maybe instead it should be an all or nothing approach. We can track if functions is referentially pure or touches global variables. It is possible to make purity part of function type. Then we could limit value semantics analysis only to referentially pure expressions.