Strict Value Semantics

Joe_Groff · January 26, 2019, 6:42pm

Even though FloatingPoint.== treats non-identical values as equal and identical values as non-equal, all floating point values nonetheless have well defined identity, and the semantics of floating point operations ought to be pure. Likewise with different encodings of strings. It doesn’t make sense to me to formalize Equatable as the condition for referential transparency, since there are types that can be used in a pure context that aren’t Equatable at all, such as functions, that nonetheless can be reasoned about as referentially transparent.

michelf · January 26, 2019, 9:20pm

Yeah, that's pretty much what I was getting at by pilling up examples of things that wouldn't work with Equatable as the currency for referential transparency. I just didn't write the conclusion.

anandabits · January 26, 2019, 10:01pm

In theory I agree with this, however I think in practice escape hatches are a necessary compromise, especially in a language as focused on pragmatism as Swift is. Swift already includes escape hatches of various kinds - it supports raw memory access, has unsafeBitCast, has withoutActuallyEscaping, etc. These exist because sometimes they are necessary in library code. Without an escape hatch there would be some libraries that just can't be written, or can't be written efficiently. As Joe pointed out, even a language as uncompromisingly pure as Haskell includes an escape hatch.

A further example, imagine what it would like to use Swift on Apple platforms if it wasn't possible to annotate and trust Objective-C code for nulllabiity. A similar issue exists for purity. Should we be unable to call C code known to be pure after careful auditing (where the authors are wiling to commit to the contract of purity)? If we can't then calling into any C code at all will become viral upstream and we lose the ability to accurately recognize the semantics of not only this C code but also all upstream Swift code.

Given the large volume of C libraries at the foundation of our systems today I think this is a real concern. We want to be able to build on top of existing C implementations, even if they are eventually replaced by native Swift implementations (that perhaps wouldn't need to use an escape hatch).

I agree with this. I think escape hatches should be taken very seriously. They are expert tools intended to enable the implementation of libraries that would not otherwise be possible. Library authors making use of them should have clearly defined semantics they are promising to uphold and should take that promise extremely seriously. They definitely are not a tool that should be used in application level code (a linter rule banning escape hatches is very appropriate in that context).

What happened here is that the string type supports more than one internal representation. The representation is hidden from users, and when strings are compared for equality the comparison is done in a representation-independent fashion so two strings representing the same logical value compare equal despite possibly having different representations. So far so good.

Then, as you observed, when generating the HTTP response buffer (yes also a value), the incidental encoding of the strings becomes an essential part of the buffer. It is exactly this step of extracting the string's internal buffer directly rather than requesting data in a specific encoding that violates referential transparency. That is why response buffers created in this way using strings with different internal encodings will not compare equal. If instead you had generated the response buffers using a specified encoding (usually utf-8) then you would not have violated referential transparency and the resulting buffers would compare equal.

Your example is actually a really good demonstration of why referential transparency (and knowing when it may be violated) matters a lot!

As someone who highly values referential transparency this one really bothers me. We had an extensive discussion of this issue a few months ago and I hope there is movement on it someday.

I agree with this for now (and perhaps we're stuck with it in Swift forever), but it would be nice if we didn't have to make this compromise. As we've discussed previously, it would in theory be possible for functions to support intentional equality derived from the identity of the source implementing the function when all captured values are also equatable and immutable.

But that's a conversation for another day. For now we just need to accept that it isn't possible to implement Equatable for everything that is semantically a pure value. Instead, we have to reason using a conceptual equivalence that cannot be implemented.

DaveZ · January 27, 2019, 3:22pm

I tried to focus my example on the pre-I/O data transforms of a hypothetical web server, so I don't think Haskell's unsafePerformIO applies (or D's unsafe blocks), right?

More so, I think your observation that Equatable isn't the right protocol for "pure" is correct. That protocol has too much baggage, especially given IEEE 754.

anandabits · January 27, 2019, 4:11pm

Referential transparency is meaningless without some notion of equivalence. Since we can't implement Equatable everywhere we need a more general notion, but where Equatable can be implemented it should be and it should be the measure of equivalence used in those cases.

I don't understand this statement. I don't see any problems with Equatable itself. The problems are types that can't conform because of language limitations and types that conform without meeting the semantics of the protocol (reflexivity, symmetry and transitivity). The floating point issue has been discussed extensively and I favor a resolution that would allow floating point types to uphold these and assign different names to the IEEE 754 operations. We shouldn't blame Equatable for issues that originate elsewhere.

xwu · January 27, 2019, 4:29pm

The discussion with respect to floating-point types concerns the issue that NaN != NaN. However, even if floating-point comparison operations were changed to meet the semantic requirements of Equatable, they still would not be appropriate for the semantics you want here: certain mathematical operations would give very different results when the argument is -0 or +0. This illustrates how the semantics of Equatable are not sufficient for what you want to do here.

anandabits · January 27, 2019, 4:47pm

As far as I can tell there are two ways to handle this. One is to define equality such that that -0 != +0. The other is to say that these operations are not reverentially transparent. Do you see any alternatives?

The semantics of Equatable define a canonical equivalence relation for a type. Such an equivalence relation is exactly what is necessary for referential transparency. What other notion of equivalence would we use?

Joe_Groff · January 27, 2019, 4:52pm

Whether you can substitute an expression a with another expression b without any impact on the semantics of the program seems like the concept you're looking for. It seems to me unnecessarily confining to require a source-level operation to enshrine that concept, since it isn't always practical or even possible to compute at runtime.

xwu · January 27, 2019, 5:01pm

The alternative is not to consider the equivalence semantics of 'Equatable' to be sufficient for determining what is or isn't referentially transparent.

This is what @DaveZ, @Joe_Groff, and I are suggesting.

anandabits · January 27, 2019, 5:02pm

Agree, but that doesn’t mean we can’t define the concept such that when a reasonable implementation of Equatable is possible it should be provided and should be consistent with substitutability. And when an implementation of Equatable Is not possible the type should still define a conceptual equivalence relation. If the author of the type isn’t able to do this it isn’t possible to think clearly about upholding the property of substitutability.

anandabits · January 27, 2019, 5:05pm

I was asking how you would define equivalence for the purpose of referential transparency.

I agree that it isn’t always possible to provide an implementation of Equatable but the conceptual semantics of a canonical equivalence relation seem to be exactly what is necessary for referential transparency to be meaningful.

xwu · January 27, 2019, 5:16pm

Some equivalence relation is necessary for referential transparency, but not necessarily the same relation that is meaningful for Equatable.==.

Many fundamental types in Swift simultaneously model two or more related things. For instance:

An integer type simultaneously models a mathematical integer and the sequence of two's-complement bits that represent it
A floating-point type simultaneously models a mathematical real value and its IEEE representation in four ways enumerated by the standard
'String' simultaneously models a sequence of human-readable characters and the Unicode scalars, UTF-16 code units, etc., that encode that sequence

In each case, Equatable.== considers equivalence from the standpoint of one model; in the case of 'String', for example, it is the sequence of human-readable characters as recognized in the form of Unicode extended grapheme clusters. What matters for you is equivalence from the standpoint of all models.

anandabits · January 27, 2019, 5:43pm

@scanon what’s your take on this? What would it even mean for floats to be equivalent as models of reals in the presence of NaN? Isn’t failing to recognize that floats are not reals exactly what gets people into trouble?

DaveZ · January 27, 2019, 5:55pm

Please forgive me, but the floating point signed zero debate (and its implications) was settled long before Swift. We can't revisit it.

We just need a new protocol. Call it something like Substitutable, where as @xwu aptly points out, types that conform promise that all models are considered, not just one (i.e.Equatable).

As an aside, and back to the patch in discussion, what does referentially transparent even mean if reference types are opaque? How is a reference any different than a value at that point? Said differently, if "pure" functions can't do I/O, then how could they notice that a reference type destructor did any I/O unless the "pure" function violates the "pure" model?

Joe_Groff · January 27, 2019, 6:24pm

I don't think a new protocol is necessary. Part of the purpose of this feature as I see it is to define referential transparency. 'Pure' is saying that the result of invoking a function with identical inputs produces identical outputs, and a means of enforcing through the compiler that that is true. As such, it should be the language's job to define what 'identical' means for the fundamental types and pure operations the standard library provides, and allow the definition to compose through propagation of the 'pure' constraint. As a starting point, one could use the implementation-level concept of "bitwise takable" types, which are types whose values do not depend on their storage address in memory. For such types (in practice, almost all Swift types that exist today), values with identical bit-level representations are guaranteed to be semantically equivalent.

DaveZ · January 27, 2019, 7:35pm

Great! In that light, I think my patch might be closer to a “pure” implementation than I thought.

anandabits · January 27, 2019, 10:15pm

It was settled for IEEE yes, but we have some options in Swift. See Comparable and FloatingPoint types. Whether we will revisit this is yet to be seen, but the door is not completely closed.

This thread is the first time I’ve encountered this notion of needing to consider multiple models. I don’t fully grasp the implications, but it seems like the desire to consider floating point numbers a model of real numbers is the only problematic example thus far. It isn’t clear to me that this is even possible in a reasonable way as floating point numbers are able to represent values that are not reals.

I would find it very surprising if this definition was ever inconsistent with Equatable. Is it fair to say that they will always be consistent when an Equatable conformance is provided?

Do you have any thoughts on how this would be defined for floating point types and operations? Would floating point == and < be considered pure?

It doesn’t seem like this approach is sufficient in order to suppor a third party data structure implemented from scratch using unsafe concepts. Wouldn’t the library author need to define ‘identical’ for this type? Wouldn’t we want this definition to be consistent with the definition applied by the language to fundamental types? Or did you mean that the language would also provide a clear definition of the semantics that third party libraries could also adhere to?

This makes sense as a starting point, but it obviously doesn’t address types which may use different internal representations to model the same value.

Lantua · January 28, 2019, 4:53am

Implicitly relying on user’s definition(Equatable) would be dangerous, they can first-and-foremost be buggy, and user will likely not define it with this implication (this patch) in mind. While of course they should be able to explicitly opt-in.

Even if the Equatable is consistent with this propagated definition, wouldn’t it then be better to always use the latter?

michelf · January 28, 2019, 12:55pm

It does so in a sense simply by treating them as different values.

This way of defining purity covers the case where the internal representation leaks some details (capacity, string encoding, sign of zero). It allows implementation of everything to be checked by the compiler (except when it comes to interacting with other languages). It's also all you need for correctness in multithreading (anything more would be unnecessary restrictive from this point of view).

That's obviously not the kind of pure you're looking for, but in the end I think it's the best solution. I wish our discussion was more filled with the use cases you want to enable. I realize now that all I can think of is a result cache that is "provably" correct. I use quotes because you still have to annotate all incidental members and you can't really prove they were annotated correctly since deciding what to make incidental is more a judgment call than a verifiable rule.

That said, I loved our exercice of going to the bottom of it. Perhaps that can be useful later.

DaveZ · January 28, 2019, 1:09pm

Here is another way of looking at it. In an abstracted system, the definition of "equal" can change at each abstraction layer as details are either added or hidden. If a type allows extraction of implementation details, then multiple definitions of equality are potentially exposed. String is a great example. It is a simplifying abstraction of top of various Unicode primitives, which have independent equality rules. If we ban "pure" functions from getting at underlying abstractions in order to make "pure" hinge on "equality", then we'll have a fairly unrealistic and unworkable system.