On purity grades of value types [was: ValueType protocol]

further thoughts on the subject:

eventually we may blend "clean" and "fair" purity grades together: after all a constant 123 allowed at level "clean" is not any more superior than a global constant "let colors = ["red", "green", "blue"]". however for the sake of this discussion i'd like to keep these levels separate, if not for anything else then to lay out stages of future work to be done (first "absolute" stage then "clean" stage then "fair" stage).

can "absolute" and "clean" grades blend together? the short answer is no, that would be unwise. consider hash: in all these three levels (absolute, clean and fair) it won't be physically possible to write code that violates the main hash axiom (if "a" == "b" ==> then a.hash == b.hash), so on the correctness front these three levels are equal. but on levels clean/fair it would be trivial to implement a bad hash function, e.g. always returning "0", which will open the app to hash collision attacks.

likewise it would be trivial to write a bogus "lessThan" (or EQ) function that, say, ignores some of the bits of the operands, or simply always returns false. in this case the levels will be different not just on the performance front, but on the correctness front as well! thus it makes total sense to have "absolute" as a separate "absolutely safe" level where nothing can possibly go wrong (at least with value-typeness of pure value type objects) no matter how hard malicious or unexperienced developer would try!

when it comes to practice absolute level would be required for code certified to work in, say, nuclear reactors, life support, or avionics environments. perhaps banking apps. clean/fair purity grade - anything else. dirty purity grade - e.g. to inter-op with other languages / foundation.

answering your question (statement) above "what makes EQ & hash special". the fact that dictionary is part of the language and to make absolute level more capable... array/dictionary/set are all part of the language. and the established practice is for the last two to be hash based -> hence hash is special. and so is EQ. LessThan would also be special if we had "TreeDictionary" in the language. I'd say - let's also make it part of this special "absolute" purity grade team. going forward we may include other things to this special domain to make absolute purity domain more capable. without eq/hash/lessThan - the absolute domain would be safe, but like asm... totally inconvenient.

we may also need another level, in some ways even more restrictive than absolute!

a very special level, not like the others of this scheme, but more like another dimension or perhaps a separate trait: realtime. it won't be able to call anything that can block or cause memory allocation. on the other hand accessing global memory even read-write is not a problem on this level. realtime level would be required, say, for realtime audio thread code. let's not discuss this level in this thread, just keep it in mind.

FWIW, I agree with the fact ther things you’ve written but inasmuch as the main purpose of the term is to make a distinction from value semantics, i don’t think the ability to do identity comparison is actually crucial.

1 Like

I think the very notion of value semantics is inseparable from equality. Even “equivalence”, because currently Equatable in Swift is a pure-mans proxy solution for that. Ideally it should be an inherent attribute of all types, including functions and weak references.

This probably belongs to ValueSemantic protocol, but I’m too lazy to go over entire discussion (currently on 20/126). I’ll try to digest it and join the discussion later. Meanwhile let me comment here.

I think I can provide a formal theoretical definition for value semantics:

Given expression E, let {x1, x2, … , xn} be a set of variables touched by expression E of types {T1, T2, …, Tn}. This includes let’s and var’s which are lexically used in E, but also all global variables and constants touched by a transitive closure of functions (including initializers, getters, setters, subscripts and deinitializers) callable from E. Including global variables outside Swift code. For example, behind Date.init() there is some hidden global variable which stores current time. Such variable might even reside in external hardware.

Let’s consider evaluating E twice - first when variables have values {v1, v2, … , vn} it evaluates to ve, and second time when variables have values {u1, u2, … , un} it evaluates to ue.

All types in set of types S have value semantics, if “equivalence” of all pairs vi <-> ui, implies equivalence of ve <-> ue for all E touching only variables of {Ti} ⊆ S.

Type Tk does not have value semantics if there exists expression E, touching variables of types {T1, T2, …, Tn}, where all types {Ti | i != k} have value semantics, vi is equivalent to ui for all i, but ve is not equivalent to ue.

If we have an expression E where types of touched variables include type Tk of unknown semantics, and type Tj known to have reference semantics, we cannot say anything about semantics of Tk based on E.

Based on this definition:

  1. Date has value semantics. If you can somehow control global variable holding current time, Date.init() will give you perfectly reproducible results. Similar for Int.random().

  2. Array has reference semantics, because of Array.capacity. But for the practical purposes, I would mark it with some attribute, so that “remaining Array without capacity property” still has value semantics.

  3. Classes have value or reference semantics, depending on how you define equivalence on them. If you compare them only by address, then classes have reference semantics. If you compare them by address and values of all the stored properties, they become value semantics, because now values of properties become part of the value of the class type and part of the corresponding vi/ui. If you compare them only by values of properties, but ignore the address - you still have reference semantics, because, you can get two different ObjectIdentifiers out of two equivalent values.

  4. If you add extra variable for entire memory (or even entire hardware state), every type has value semantics.

  5. If you don’t add such variable, but have an API which allows to read hardware state based on Int, all types having at least two values (so, except Void and Never) have reference semantics.

Also, I suspect that transitive closure of callable functions is not computable. Even if we disregard control flow because of halting problem and do conservative analysis, still because of closures, protocols and subclassing, such set would include almost every function.

Transitive closure of callable functions is needed only for global variables. So maybe instead it should be an all or nothing approach. We can track if functions is referentially pure or touches global variables. It is possible to make purity part of function type. Then we could limit value semantics analysis only to referentially pure expressions.

1 Like

Getting value semantics wrong can easily lead to a race condition. All you have to do is share mutable state but advertise it as part of a well-defined value.

I still believe that I have already posted a useful formal definition of value semantics that is consistent with how we've used the term casually. In fact @Alvae, @shabalin, @dan-zheng, @saeta and I have written several papers based on this definition. Unfortunately can't share them yet since we're still trying to get them accepted somewhere…

2 Likes

One more thing before I go: while I very much support continued thinking about the meaning and consequences of value semantics, I am actually opposed to dissecting the concept of value semantics into multiple grades (especially before you've settled on a simple definition). The power of concepts like this one lies in their ability to simplify reasoning about code. I haven't absorbed what you're trying to accomplish by this classification, but it seems inevitable that thinking about correctness will be complicated by such division.

4 Likes

Okay. I don't know what exactly "equivalence" means, when it comes to Swift entities. Please bear with me while I try to figure it out below.

My intuition is that two entities of type T held by two variables a and b are "equivalent" if there is no way to define a function f that differentiates between them. This is a very loose definition, as I don't know what "differentiates" means -- I expect it involves equivalence of the function's results, which makes this a circular definition.

Perhaps we can resolve this by axiomatically defining equivalence for a simple type like Bool, and only allowing functions that return this type.

  1. The "equivalence" relation over instances of type Bool is defined the obvious way, that is, by equality of the boolean values they represent.
  2. Two instances of the type T, held in variables a and b, are "equivalent" iff for every possible¹ Swift function* of type f: (T) -> Bool it holds that f(a) == f(b).

Once we learn "equivalence" for more types, I believe we can safely loosen (2) to any function that returns a type with a known "equivalence" relation. (By a tiny lemma.)

(¹: This is probably too strict. For one thing, functions that have undefined behavior should not be considered.)

For example, I expect this definition of "equivalence" considers Int values "equivalent" iff they represent the same integer. This is good, because it matches my intuition about them. (I don't know how I'd go about proving this, though.)

As it happens, Int's Equatable conformance happens to define the an equivalence relation that exactly matches this definition of "equivalence". But this seems like the exception, not the rule.

In actual Swift practice inside and outside the stdlib, Equatable simply defines an arbitrary equivalence relation; that is all it does. (And that's only when we're lucky enough that the conformance doesn't violate requirements -- looking at you, floating point types.) It most definitely isn't being used to define the "Equivalence"; at best, it just defines an equivalence; and usually a pretty lousy one at that.

All we can say for sure is that if a is "equivalent" to b, then it is guaranteed that Equatable.== will return the same value for them, no matter what value we pass in the second operand. But this statement doesn't give us anything; it follows directly from the definition. The implication is strictly one way only, and Equatable.== isn't special at all -- the same holds for literally every other operation.

Clearly, the sign property differentiates between the Double values representing "positive zero" and "negative zero".

let a: Double = 0
let b: Double = -1 / .infinity
print(a == b) // true
print(a.sign, b.sign) // plus, minus

This suggests to me that these values aren't "equivalent", even though they belong in the same equivalence class under Equatable. (Ignoring the reflexivity issue with NaNs, of course.)

Notably, my intuition is that Double is obviously a "value type". The fact that it has a loosey-goosey == definition does not preclude it from being one.

(The same goes for String, despite capacity, utf8 and the multitude of other ways we can differentiate between Equatably equal, but not "equivalent" instances.

To me, as a naive but enthusiastic student of Swift, Array feels like a "value type" as long as its Element is a "value type". capacity feels completely irrelevant to me, and so are things like ObjectIdentifier(array as AnyObject) -- their behavior seems just fine, perfectly consistent with what I imagine a "value type" to be.)

This is a nice formal definition of something. I don't understand what it defines -- I can feel bits of pieces of some shape, but what I'm feeling seems different to my intuitive notion of "value typiness" in Swift. (Which, as I explained in a previous post, largely revolves around the lack of spooky action at a distance -- because that is the one thing I really do need to care about as a Swift programmer when I decide whether a type is appropriate for use as a Set element or a LazyFilterSequence base.)

This definition only specifies the notion of "value type" relative to some set S. What is S? Is "value typiness" a global property of a type, or does it depends on the choice of S? I'm assuming it is meant to be a global property (otherwise the passage about the type Tk does not make sense to me), but I don't think this obviously follows from the definition.

More to the point, if we took the highlighted passage seriously, then literally every type would be a "value type".

  • Array would be a "value type".

    • capacity isn't problematic at all -- its input state includes the value of the _capacityAndFlags field in the array's storage representation, which is a variable of type Int, imported from C.

    • If the theory is that capacity proves that Array isn't a "value type" because it allows us to distinguish between two "equivalent" arrays, then that's not true, either. By the definition above, two "equivalent" arrays must have the same capacity.

      Array.init involves a storage allocation, so the transitive closure of its input state includes the state of the heap. When run on equivalent input state, Array.init will always give us an array with a predetermined capacity.

    • Similarly, the expression ObjectIdentifier(array as AnyObject) consistently returns the same value, as long as we can faithfully reproduce its input state. (Which includes the array's storage address in memory, or (depending on the Element type) the state of the heap.

  • NSMutableArray would be a "value type". The major difference between it and Array is that NSMutableArray's mutation operations do not implement copy on write; but this detail is irrelevant from the perspective of the formalism. (Mutations operations include the original value as their input.)

  • SystemRandomNumberGenerator would probably be considered a "value type" too. There is no way to produce two instances of SystemRandomNumberGenerator that aren't equivalent -- it's a unit type, after all. I expect the input state of SRNG.next() includes some aspect of the quantum state of the universe (whatever that means), which makes it rather difficult to replicate.

I don't understand this definition, either, but my vague intuitive concept of what should be a "value type" in Swift feels to be a much better match with this shape!

(What does it mean to "observe" the value of a variable a? What does it mean to "alter" the value of a? I look forward to reading those papers.)

4 Likes

there's a catch though... if you start insisting on a single simple definition of value semantics it is unlikely you'll ever end up with several grades of value type purity. and if you start with a line of thought allowing several different grades of purity - you may end with several different definitions for it.. the end result is very different depending upon where you start.

i agree but at the same time you'll potentially throw the baby out with the bath water and end up with a lowest common denominator compromise definition. consider a few "user stories" like this very simple one:

let a = some value type expression 1
let b = some value type expression 2
let eq1 = a == b
let eq2 = a == b
assert(eq1 == eq2) // assert 1
let eq3 = b == a
assert(eq1 == eq3) // assert 2

let hash1 = a.hashValue
let hash2 = a.hashValue
assert(hash1 == hash2) // assert 3

let lt1 = a < b
let lt2 = !(b >= a)
assert(lt1 == lt2) // assert 4

can this assert or not in the "value semantics" definition you are trying to define? can your value semantics definition even start answering simple questions like that? will that definition depend upon developer "playing fair" and always "following the rules" which are "attached" to the definitions or will it work "no matter what"? the above grade system at least gives answers to questions like those (assert1 and assert3 will never trigger in absolute/clean/fair purity grade code but might trigger in dirty grade code, assert2 and assert 4 will never trigger in absolute purity grade code but might trigger in other grades).

I can perhaps paraphrase @dabrahams's definition of value semantics, repeating some of what we've written in our papers.

One way to understand that definition is to first think about what we can and can't do with types that have (mutable) value semantics. More precisely:

  1. If the type T of a variable x has value semantics, then there is no way to observe its value (i.e., read its contents) with an expression that does not mention x.
  2. If the type T of variable x has value semantics, then its value is independent: its contents cannot change due to an operation on another variable.

Both points uphold local reasoning [1] and suggest that reference semantics must not surface. If the value of x can be observed without mentioning x, then there must be a (possibly immutable) reference on x somewhere. Similarly, if the value of x can change via an operation on another variable, then x must share mutable state.

For a theoretical foundation about the idea of value independence, one can look into external uniqueness [2, 3], which I think formally expresses @tera's clean level.

From points 1 and 2, it naturally follows that:

  1. Concurrent access to the values of distinct variables of a type T with value semantics cannot cause a data race.

For a theoretical justification of that assertion, one can look into concurrent separation logic [4], which states that parallel composition of separate environments is race-free. Intuitively, if x has an independent value (in the sense of points 1 and 2), then one can reason locally about its value, without any concern for concurrent execution.

When you add the first point of @dabrahams's definition about notional values, you can derive other interesting properties.

  1. Composition of value types expresses whole/part relationships.
  2. Equality and hashing are well-defined and synthesizable for every value type.

Note for the purists: formally, hashing requires an ad-hoc definition for at least some predefined types, acting as the base cases of an inductive definition.

The fourth point is particularly important, because it answers recurring questions of "depth" in object-oriented languages (e.g., deep vs shallow copy/equality/immutability).

Notional values are what lets us discard capacity when we're talking about the value of an Array, because the notional value of an array is made of its elements. It follows that equality and hashing do indeed derive from the elements contained in the array, not the current state of its internal storage. In other words, the concept of notional value is what allows us to "ignore some bits" in a type's representation. Further, you can represent the notional value of a value type with all sorts of reference shenanigans, as long as you satisfy points 1 and 2. That bit explains why CoW does not violate value semantics.

Note that synthesizability is not a requirement. You can have a value type that does not want to conform to Equatable for any justified reason in a given codebase. The definition only says that (value) equality is well-defined and derives from the type's notional value.

AFAICT, the other points in @dabrahams's definition relate to the mutable part of a value and are more specific to the way Swift expresses mutation (e.g., via assignment, mutating methods, etc.). Indeed, if we outright ban mutation, all types trivially have value semantics and there isn't any useful distinction between structs and classes (as far as value typedness is concerned) [5]. So, we can add the following point:

  1. We use the term mutable value semantics meaning “value semantics with in-place, part-wise mutation.”

The terms "part-wise" and "in-place" are in opposition to pure functional updates, i.e., the fact that any mutation of a variable x can be substituted by the rebinding of x to brand new value. While that approach is perhaps justifiable as a theoretical model for value semantics, it clashes with performance perdictability.


Yes, I think it does.

As it's been made clear already in this thread, substituting struct for class in a type definition is far from sufficient to satisfy value semantics, without even considering unsafe code.

There are multiple ways to unintendedly create references in Swift (which is a bit unfortunate, if you ask me). In particular, global variables and (by default) closure captures have reference semantics. That makes it difficult to establish static rules that would enforce "value typedness" according to the definition I have presented.

Nonetheless, I think that is possible. We would "only" have to be more careful about the way a value can capture references. Swift already proposes one mechanism in that direction, in the form of the @escaping annotation on function parameters. An extension of that idea has been studied the context of Scala [6].

I think that the idea of extending the capability of @escaping (or rather @nonescaping) relates to @tera's fair level.

Purity levels are not in opposition or even orthogonal to the definition of value semantics above. Nonetheless, I am not yet convinced that these distinctions are as useful as they may seem, at least for the user. IMHO, local reasoning is enough.

I also fear that these levels might fail to accurately characterize all possible implementations of a value type. For example, it looks like implementing CoW would immediately put a type into dirty territory. It could be tempting to add yet another level, but that's a slippery slope.


[1] Peter W. O'Hearn et al.: Local Reasoning about Programs that Alter Data Structures. CSL, 2001
[2] Dave Clarke, Tobias Wrigstad: External Uniqueness Is Unique Enough. ECOOP, 2003
[3] Philipp Haller, Martin Odersky: Capabilities for Uniqueness and Borrowing. ECOOP, 2010
[4] Stephen Brookes, Peter W. O'Hearn: Concurrent separation logic. ACM SIGLOG, 2016
[5] Dimitri Racordon, Didier Buchs: Featherweight Swift: a Core calculus for Swift's type system. SLE, 2020
[6] Leo Osvald et al:. Gentrification gone too far? affordable 2nd-class values for fun and (co-)effect. OOPSLA, 2016

4 Likes

I didn’t explain it well, but my idea was that definition of the equivalence is left to the type author. Effectively implementation of the == is the type-specific definition. And the way how authors of the type implement it affects if type is a value type or a reference type.

You don’t need to know if type is a value type for that. All you need is an implementation of Hashable conformance consistent with mutability analysis in Swift. If at some point of the execution of the Swift program x == y and x.hashValue == z there must be no operational on x which changes that, but is not a mutation from the type-checker perspective. This property must hold regardless of value/reference semantics. Types for which this property does not hold are buggy.

Which leads to a question: do you even care about value/reference semantics? Sounds like property you are interested in is “correct implementation of Hashable conformance”.

It is a set of types of variables used in an expression. We can prove that one of the types is not a value type, but actually it looks like we cannot even know which one. If all expressions using variables of single type T behave as if T is a value type, and all expressions using variables of single type U behave as if U is a value type, but there exists an expression using mix of variables of both types, that violates this, which type is to blame - T or U?

Yep, totally agree.

In practice, the distinction between value types and reference types is based on semantics, not the implementation. In practice, all data works with references: to CPU registers, for instance. Like GOTO, reference semantics are a reflection of how computers actually operate.

Current best practice is that things Swift calls value types implicitly use value semantics, things Swift calls reference types implicitly use reference semantics, and every deviation from that is carefully documented.

Container types like Array are self-documenting, more or less: Array<AnyObject> obviously (or rather, implicitly) uses value semantics while containing elements that do not. Same with Optional<AnyObject>. Note that the actual implementation is irrelevant: both Array and Optional use references internally to implement CoW.

As others have mentioned, a form of “purity grade” is actually in Swift 5.5 as Sendable. It makes weaker guarantees, of course; it promises thread safety, not value semantics. Even so, it can’t actually be compile-time checked, beyond basics like requiring final on classes. In fact, most protocol conformance cannot be checked for validity beyond having the right interface. In the end, you simply need to adopt norms as much as possible, document deviations from that norm, and trust others to do the same.

There's been enough confusion around value semantics that I don't mind breaking these ideas down further in the interest of clarity. A caveat, though: if you are looking for something deep in that breakdown you may be disappointed. At some point we will use plain human language to describe everything (even math), and I've used these words with their plain English meaning.

(Switching from a to v for clarity; my mistake for using a in the first place)

Code that observes the value of a variable v has semantics that depend on the value of v, and code that alters the value of v would affect the semantics of some hypothetical code-to-be-executed that observes (reads) only the value of v.

I can think of more elaborate ways to write these definitions that use fork() and specify program outputs for identical inputs and prohibit non-determinism such as random number generators, but although these approaches may seem to be founded on understood mechanisms, ultimately they fall back on simple English words such as “identical” that you'd be just as entitled to ask me to define.

Exactly! That's a feature, not a bug. Why do we want these different definitions? Will it help us to write or understand code? If so, how?

When you have a bunch of ideas competing for your attention it is tempting to compromise by validating all of them. I consider doing the work to discover the one essential concept and fearlessly label it as such to be a form of “not compromising.”

Technically, my definition is designed to get at the minimal necessary property for local reasoning (the hard part of the problem) and thus it deals with equality and hashing only as a convention, in the “Documenting Value” section. If a and b in your example have a well-behaved value semantic type by those conventions, of course the code would not assert. The properties you're testing with those asserts are covered by Equatable, Hashable, and Comparable protocols (you haven't said whether your type conforms to these) and are traditionally united in a refined protocol/concept called Regular. They are definitely important, but are not strictly necessary in order to solve the local reasoning problem, which is what everyone has struggled with for years.

the above grade system at least gives answers to questions like those (assert1 and assert3 will never trigger in absolute/clean/fair purity grade code but might trigger in dirty grade code, assert2 and assert 4 will never trigger in absolute purity grade code but might trigger in other grades)

The question is, do we want to encourage people to require, implement, document, and reason about shades of Regular-ity or should we just give them one simple tool, Regular? A crucial part of the generic programming process involves clustering the possible fine-grained distinctions among requirements into “Concepts”. The only reason to have two concepts instead of one is when the distinction enables useful expressivity that would otherwise be impossible or inefficient (we distinguish bidirectional collections from random access collections because, e.g., Set and Array are both useful and have inherently distinct properties). I claim that a type satisfying any of your “grades” can be made to satisfy Regular without any loss of expressivity or efficiency.

It would of course be fair to argue that you want to “cluster” the local reasoning property into Regular as well, and that a separate ValueSemantic property is unimportant. I'd certainly consider that idea.

I find that a bit shocking. Unless things have gone badly wrong while I wasn't looking, for most types, that relation depends only and entirely on the values of the types involved.

This is the problem with using “equivalence” and failing to define “value.” We should decide whether the value of a floating point zero includes its sign. If we want to claim it is Regular, we'd have to say no, that the sign of a floating point zero is a non-salient attribute.

Notably, my intuition is that Double is obviously a "value type". The fact that it has a loosey-goosey == definition does not preclude it from being one.

I'm guessing you're focused on the local reasoning part, which I'm calling “value semantics,” and not on Regular-ity, which seems to be @tera's concern.

i'm not saying that's a bug... it's a missing opportunity!

i hope you can agree with me that language like swift minus ability to override EQ/hash/lessThan would be "safer" than swift - if you can't customise those basic ops you can't make them bogus, mistakenly or maliciously. so a == b will always be b == a, a < b will always be !(b >= a), hashValue, ==, < will always deliver the same results for the sames values because built in functions can't be wrong, they can't hang, spend unreasonable amount of time, touch globals, block for unbound amount of time on mutex, and so on and so forth..

and i would definitely agree with you that this version of language would be quite restrictive to be useful, as ability to customise those basic ops opens many interesting opportunities!

the purity grading system i'm talking about combines these levels into one language - you now have a choice whether you want absolute safety at the price of inflexibility (absolute) on one end of the spectrum, or you want maximum flexibility at the price of safety (dirty) on the other end of the spectrum, or you have some reasonably safe and flexible "middle ground" (clean/fair). you can have different purity grades for different parts of the app following some simple (and enforced!) rules: when at purity level you'll only be able calling through the same or cleaner purity level.

Not in the sense that we use the term around here, i.e. to indicate memory, type, or thread-safety.

and i would definitely agree with you that this version of language would be quite restrictive to be useful, as ability to customise those basic ops opens many interesting opportunities!

But I didn't make that claim, so there's nothing to (dis)agree with.

the purity grading system i'm talking about combines these levels into one language - you now have a choice whether you want absolute safety at the price of inflexibility ( absolute ) on one end of the spectrum, or you want maximum flexibility at the price of safety ( dirty ) on the other end of the spectrum, or you have some reasonably safe and flexible "middle ground" ( clean/fair ). you can have different purity grades for different parts of the app following some simple (and enforced!) rules: when at purity level you'll only be able calling through the same or cleaner purity level.

Yes, and I'm saying that having many such choices is not necessarily empowering for programmers. Unless you already have a examples of many real-world types that fit into each of these categories but couldn't be made “absolute,” and can demonstrate how you would use the categories to describe algorithms or to reason about code that is currently hard to understand, I don't see any point in breaking the world up this way.

2 Likes

I'm also curious about what this buys the average swift programmer.

I wouldn't want to have to annotate every one of my functions with one of these purity levels. It also seems like the only backwards compatible option for a default would be dirty.

Let's say that I've now re-written my app and annotated the entirety of my architecture to support something that is absolute. What happens when I need some 3rd party library that either has no annotations (must be assumed to be dirty) or has a level below my level, do I need to rewrite my whole app in response?

The level of changes and invasiveness of the change don't seem like they match the benefits of what you get from it. As a trade-off for the effort involved here, what exactly is a developer getting in return?

1 Like

not necessarily every function: there could be a reasonable system with function inheriting purity level of class/struct it is defined in unless it wants to use a different level.

very similar situation you have today when calling third party code from a real time audio I/O proc ² : if that third party code can take a lock or call malloc (among many other dangerous things!) - you just can't call it directly, you either put it onto a different non realtime thread/queue and communicate with it by merely reading/writing to memory or you bin it altogether and use a different safer code when you prefer direct callouts from your I/O proc. in more traditional use case you'd probably call that "dirtier" code on a dispatch queue. or, if that's not a big concern to you, you make the relevant part of your code that deals with that third party code dirtier than it was before.

if purity grading system is introduced - it will first be adopted by libraries authors. apps that don't care can leave purity unspecified.

value safety (the opposite of value fragility). a dictionary or a set with their invariants broken because of duplicated keys - are broken values, and we can break them providing a custom code for EQ/hash to the things we put into these containers. ditto for a mere struct or enum. that custom code can happen in a third party library you are using or your own code. that code can be there either mistakenly or maliciously. you can't break "Int" or "Double" value types, Dictionary, Set or your custom struct/enum (*) are value types, and they shall be as unbreakable as primitive value types. imho.

(*) those that adhere to david's definition of value semantic.

a much more rudimentary example that illustrates the problem without involving collection types (here xxx is some struct/enum):

let a = xxx
let h1 = a.hashValue
let h2 = a.hashValue
assert(h1 == h2) // can happen at dirty purity level due to non pure "hash" or non pure "EQ" implementation.

to not have this situation you do these things: stick to discipline, introduce coding standards in your team, carefully audit third party code you are using to see what's it's doing, perform this audit on every update of that third party code and live dangerously as every now and then shit happens, even without third party code involvement. or you introduce a purity system to your world and let compiler do the relevant checks for you.

² - realtime audio code explanation

realtime audio code is a good example that illustrates when grading system is important. to oversimplify - you write code that can't call malloc / can't lock on mutex or semaphore, can't call dispatch async, etc. of course you can't use swift containers. imagine you need to write such code - every step is dangerous, you are on a mine field, and you are on your own as (currently) compiler can't help you. if you did a mistake - it might not be immediately obvious anything is wrong. until your app is already in the field and then it misbehaves on some user systems but not others. with "realtime" grade your life would be much easier as compiler would not allow anything unsafe automatically!

we can have a less derogatory name instead of dirty if that's a concern.

Apologies in advance as things like this in the forums can go a bit over my head, but I found this thread intriguing even though I don't know a ton about some of the lower level stuff like malloc, having come from a nontraditional background to programming.

I guess I'm imagining that this would propagate similarly to throws except that semantically I don't think it makes sense to have the purity version of a do-catch handling. Wouldn't that defeat the whole purpose of having something marked as absolute?

I think part of what I'm wondering here is that either the xxx type is unambiguous and the compiler can synthesize it (this is the approach I usually take when I need to have eq/hashable code in my own code by just marking the relevant members with the relevant protocols) or it would need to be written out manually. Is the idea that manually written code can't be absolute or how can you guarantee that manually written code for a Hashable or Equatable conformance fulfills the requirement?

sorry, i don't understand the question.

correct, "absolute" can't have custom EQ/hash at all. "clean" can't read/write external memory in their EQ/hash, "fair" can read but not write to external memory (clean/fair can be blended together with no much harm), "dirty" can do everything. interesting feature of clean/fair is that a custom "hashValue" (no matter how possibly written!) will always return the same result for a given value. this "mathematically bulletproof" aspect is what makes it so fascinating.

Maybe I can clarify with an example:

func embeddedFunction() {
  // Does something
}

func callingFunction() {
  embeddedFunction()
}

Given the above functions, we have a recourse in the future if embeddedFunction needs to be converted to throwing but the calling one cannot.

func embeddedFunction() throws {
  // Does something
}

func callingFunction() {
  do {
    try embeddedFunction()
  } catch {
    // Do catch let's me bail out if I need to call a throwing thing from
    // a context that won't throw and I can log or otherwise handle the error
  }
}

There's an escape hatch here for interfacing with a throwing thing that lets the developer control the flow of errors throughout the app and recover, log, or otherwise not handle. Given a function that would be marked as absolute, there would be seemingly no way to call a dirtier function. In fact, that's kind of the whole point, right?

dirty func embeddedFunction() {
  // Does something
}

absolute func callingFunction() {
  embeddedFunction() // not callable here and no way to do so
}

If the embedded function needs to change absolute to less pure there's no other option but for that to propagate all the way up the call hierarchy. Is this the trade-off you were referencing between strictness and flexibility?

indeed, absolutePureFunction() will not be able calling a function which is dirtier than absolute. (like kernel space code can't call user space code. or realtime safe code can't call non realtime safe code. like low level code generally shall not call high level code (i still remember that odd looking "reinsert ejected disk or press cmd+. to abort" alert window when the app attempted to read a file on the ejected disk)).

you can add that as a tradeoff that "dirty is unsafe but can call anything". basically i meant that the cleaner you go the more mathematically bulletproof your app is, but at the same time the fewer things you can do. as an example you can have a non customisable always right bitwise a < b on the absolute level but to make it customisable you'll need to go to clean/fair level, and to base it, say, on system locale comparison rules (that can dynamically change) you'll need to go to the dirty level. and the dirtier you go the less mathematically bulletproof the app is (irt value type axioms, app correctness, halting, crashes).

I think my vague uneasiness comes from a few things overall here as someone who usually works with Swift at a higher abstraction level:

  1. Are there any cases where it would be either valuable or appropriate to override the compiler here? This discussion makes me think of when I've done some physics work in the past for school and you have to assume that the object is a sphere with zero friction to get it to work out for the calculations. Is it possible in the real world to make these kinds of guarantees about absolute code, or will it be very rare in practice?

  2. It seems likely that purity levels have the potential to cause a lot of code churn as they display similar propagation patterns to others that we've seen in some of the other effectful systems in Swift (try/throw, async/await). You've said that there could be some sensible defaults, but I'm not sure what that would mean or how that could help mitigate some of the issues here.

  3. How would this play with progressive disclosure? So far all of the examples that you've added here seem to be at lower level of abstraction or more niche code. With other features that are powerful and niche, they tend to have a smaller footprint and "scarier" names whereas this seems to be at a similar level of visibility and prominence as something like access control levels or throws and the like. I suspect that someone writing a Vapor app or working on one of the Apple platforms may not necessarily want to have to consider the same guarantees or matrix of choice presented here. How do you envision something like this working with progressive disclosure?

Terms of Service

Privacy Policy

Cookie Policy