Strict Value Semantics

anandabits · January 24, 2019, 4:24pm

It's not quite as simple as that because strict values with sophisticated implementations require private references in the implementation. Strict value semantics are guaranteed by using CoW, etc in the implementation. It isn't clear to me what the best way to model this at the language level is.

Direct support for CoW in the language and / or standard library would go a long way but I still think we want the ability to drop all the way down to "unsafe" implementations with a programmer promise to maintain strict value semantics. So a hybrid approach of CoW support that handles the majority of cases combined with an escape hatch that's available when necessary is probably where we need to end up.

Something along these lines makes sense to me as long as we have a way to constrain types to be strictly pure in generic code, so @pure types automatically conform to AnyValue (or something like that).

Keep in mind that this annotation is only used in the context of an @pure or @pure(access) type. I don't see it as implying a moral judgement as much as highlighting semantics that are out of the ordinary for members of this type. The goal is semantic clarity, not making people feel bad!

michelf · January 24, 2019, 7:22pm

As an aside, maybe you should take a look at pure functions in D for an example of this.

anandabits · January 24, 2019, 7:52pm

I have looked at them quite a bit. This models pure functions but does not model pure values.

michelf · January 24, 2019, 8:38pm

This is a distinction you make I'm not entirely sure I grasp well. My understanding is that for a value to be pure the way you want the equality operator should take into account all the observable states. For instance, this type would be pure (value type):

struct NoArray: Equatable {
    var count: Int
    static func == (a: FakeArray, b: FakeArray) -> Bool {
        return a.count == b.count
    }
}

While this second one isn't pure because now you can observe capacity even though it is totally independent of equality:

struct NoArray: Equatable {
    var count: Int
    var capacity: Int
    static func == (a: FakeArray, b: FakeArray) -> Bool {
        return a.count == b.count
    }
}

If capacity was made unobservable then it would become a pure value type again. Is that a correct interpretation of your wishes?

anandabits · January 24, 2019, 8:48pm

No, it is always going to be necessary to support incidental members that are necessary for the implementation of the representation but not essential to the meaning of the value. I also think it's fine to expose incidental members to users when they can be used in meaningful ways. However, I think these members should be annotated with @impure (or whatever syntax is used) and not be available in pure functions without the programmer using an escape hatch which requires them to promise a referentially transparent result.

Incidental members are orthogonal to whether a value is pure or not. A pure value is independent, i.e. free of entanglements with the rest of the program. This property allows you to do things like cache the value, memoize functions returning the value, etc without needing to consider anything other than the space used by the representation.

michelf · January 24, 2019, 9:20pm

This is exactly what I was implying by "make unobservable": make them unobservable whenever the context cares about being pure. Looks like I figured you out after all.

It sounds tedious and error prone however. It's not like the compiler can check what is incidental and what isn't. You have to trust types and their properties to be all labeled properly and you have to trust everyone to use the same definition of what is "incidental".

anandabits · January 24, 2019, 9:43pm

Gotcha, I missed that you were talking about only a pure context.

Most user-defined values don't have incidental members as they are usually just compositions of more primitive values. Library developers who are writing value types with sophisticated implementations should already be thinking about this as it is important to a correct implementation of Equatable. It's not foolproof but will help users of correctly annotated libraries. And who knows, maybe someday the compiler (or a linter) would be able to notice if you wrote a custom implementation of Equatable that ignores some of your stored properties and warn you that maybe they should be marked as incidental.

IMO, this is an important tool if we are interested in clear, referentially transparent semantics. Without a way to model incidental members true referential transparency goes out the window. I'm wiling to accept the possibility that something isn't annotated correctly. Outside of having a formally verified compiler we have to have a baseline set of primitives where we trust at least some programmers. It's good to narrow the scope of when we need to do that but it isn't possible to eliminate entirely.

DaveZ · January 25, 2019, 12:19pm

I'm not worried about the user experience of @impure. Most of the time it'll be a viral attribute and therefore not error prone. For example, if you design a "pure" type that uses an underlying array and one of the type's methods calls capacity(), then the compiler will complain about that method not being marked @impure due to capacity() being called.

anandabits · January 25, 2019, 3:25pm

Right. It wouldn't need to be an array stored in a property either - the same thing would happen if capacity was used on a local array or one received as an argument.

However ideally there would be an escape hatch that allows the programmer to use capacity while promising their use of that property will not influence the result of operation. For example, if a library knows enough about the usage of a specific array it might wish to implement custom growth strategy as described in the documentation of reserveCapacity (Array — SwiftDoc.org). It should be possible for advanced users to do this without everything becoming impure.

Joe_Groff · January 25, 2019, 4:06pm

reserveCapacity itself could be considered pure because it doesn’t have any semantic effect on the value.

anandabits · January 25, 2019, 5:03pm

Agree, but if a library was implementing a custom growth strategy it may need to read capacity before calling reserveCapacity and should be able to do so for that purpose without the method becoming impure (using an advanced feature such as an impure escape hatch that expects the programmer to uphold the promise of referential transparency).

Joe_Groff · January 25, 2019, 5:07pm

IIRC @Ben_Cohen and his team did a survey recently and couldn't find or think of any legitimate use cases for directly reading capacity. reserveCapacity is already a no-op if you ask for capacity that already exists, and makes no guarantee that you get exactly the capacity you asked for if it does reallocate.

Ben_Cohen · January 25, 2019, 5:09pm

There are also performance traps to fall into if you try and manually expand capacity in a loop instead of doing it all up-front, which I think adding a capacity field would bait.

anandabits · January 25, 2019, 5:13pm

Absolutely. I brought this up because of the caution in the documentation about naive uses of reserveCapacity. Maybe it's not a great example here.

The broader point is that an escape hatch is probably necessary if we want to support implementation techniques that take advantage of incidental / representational information in ways that do not violate referential transparency.

Joe_Groff · January 25, 2019, 5:16pm

Yes, that's one of many reasons an escape hatch is almost definitely necessary.

michelf · January 25, 2019, 8:16pm

I agree too that there should be an escape hatch, but to me escaping all purity checks for the benefit of observing capacity seems a bit drastic. This leads me to think we should have two levels of purity (with a different name for each), each protecting against different kind of dangers:

can only write to local values or values behind pointers that are uniquely referenced (works with CoW)
can only observe values that take part in the equality check of their underlying type (referentially transparent)

#1 prevents a function from modifying things it shouldn't have access to, while #2 prevents a function from being influenced by things that would break equivalence. #1 is good for thread-safety and isolation and #2 is good for memoization.

A good escape hatch could let you get out of #2 but still be subject to #1.

(An interesting observation of this formulation is that purity #2 can't use types that are not equatable. That sounds drastic, especially if it forbids closures. So maybe that definition doesn't work.)

anandabits · January 25, 2019, 8:37pm

We wouldn't have to design the escape hatch to be all or nothing. For example, maybe we have something like this:

let capacity = impure(unchecked) myArray.capacity

With an escape hatch like that everything else is still checked for purity.

I would like to see an effect hierarchy. For example, another effect people reasonably care about is whether a function might trap or not. We can't check totality but we can check for explicit traps for precondition violations, etc.

Sure, the escape hatch could be parameterized to specify exactly what is being allowed to get by unchecked.

#2 also relies on the type being a pure value (i.e. just data). You definitely wouldn't want to cache / memoize [UIView] for example.

Requiring equality for pure values would be great, but we need some language fixes in order to support that. Tuples would need to be conditionally Equatable, metatypes would need to be Equatable, functions that only capture pure values would need to be Equatable (using intensional equality on the body of the function), etc. All of this seems doable, but also not something that will happen soon. So requiring pure values to be Equatable doesn't seem viable in the short term.

So I think purity #2 would need to support types that are not Equatable, perhaps without the ability to perform some checks on those types. The good news is that being unable to implement Equatable doesn't prevent you from documenting the intended equivalence for the type and annotating incidental members accordingly if necessary, so checks relying on those annotations would still work.

DaveZ · January 26, 2019, 6:06pm

As a general rule, I don't like escape hatches. They point to weaknesses in design. If people insist on having them, then an honest effort to define the rules should be made first, otherwise one ends up with the C++ "const method" problem, where const methods can't be optimized due to the const_cast escape hatch.

Separably, I'm not sold on a definition of "pure" that excludes incidental methods/data. For example, imagine that one tries to write a web server using "pure" logic/values as much as possible. Now imagine two equal string values are incidentally encoded as UTF-8 and UTF-16. When generating the HTTP response buffer (arguably also a value), the "incidental" encoding of the strings become an essential part of the buffer – nevertheless, both the strings and response buffers meet the definition of values, and arguably strict ones at that ("pure" data, no references, etc). So what gives? How do functional programmers deal with this kind of problem?

michelf · January 26, 2019, 6:37pm

I agree strongly with this. This is why we need to support making new CoW types without an escape hatch. Nevertheless, anything imported from a C API will need to trusted by the compiler to behave as annotated, so the escape hatch is probably inescapable.

I'm not sure if that is a good example case because the String API abstracts the encoding pretty well as far as I am aware. However, using combining characters you can indeed compose two equal strings with a different sequence of unicode scalars, and thus observing unicode scalars (or any derived encoding) would break referential transparency.

Another troublemaker would be the IEEE floating point types. There is the +0 vs. -0 situation, NaN values, and the NaN payload that comes to mind as probably not referentially transparent.

Joe_Groff · January 26, 2019, 6:40pm

There’s prior art to look at here, such as Haskell’s unsafePerformIO or D’s modeling of unsafe blocks within pure scopes. At the interface level, a pure function in these systems is generally subject to the semantic flexibility pureness implies, namely that the function could in reality be run zero, 1, or N times independent of how many times it’s explicitly written to run in source code, and that it may be called out of order with other pure operations. The side effects thus may not happen, or may happen multiple times in arbitrary order, and it’s the programmer’s responsibility to ensure that those effects don’t have a lasting impact on the rest of the program if that happens.