Strict Value Semantics

Please forgive me, but the floating point signed zero debate (and its implications) was settled long before Swift. We can't revisit it.

We just need a new protocol. Call it something like Substitutable, where as @xwu aptly points out, types that conform promise that all models are considered, not just one (i.e.Equatable).

As an aside, and back to the patch in discussion, what does referentially transparent even mean if reference types are opaque? How is a reference any different than a value at that point? Said differently, if "pure" functions can't do I/O, then how could they notice that a reference type destructor did any I/O unless the "pure" function violates the "pure" model?

I don't think a new protocol is necessary. Part of the purpose of this feature as I see it is to define referential transparency. 'Pure' is saying that the result of invoking a function with identical inputs produces identical outputs, and a means of enforcing through the compiler that that is true. As such, it should be the language's job to define what 'identical' means for the fundamental types and pure operations the standard library provides, and allow the definition to compose through propagation of the 'pure' constraint. As a starting point, one could use the implementation-level concept of "bitwise takable" types, which are types whose values do not depend on their storage address in memory. For such types (in practice, almost all Swift types that exist today), values with identical bit-level representations are guaranteed to be semantically equivalent.

1 Like

Great! In that light, I think my patch might be closer to a “pure” implementation than I thought.

It was settled for IEEE yes, but we have some options in Swift. See Comparable and FloatingPoint types. Whether we will revisit this is yet to be seen, but the door is not completely closed.

This thread is the first time I’ve encountered this notion of needing to consider multiple models. I don’t fully grasp the implications, but it seems like the desire to consider floating point numbers a model of real numbers is the only problematic example thus far. It isn’t clear to me that this is even possible in a reasonable way as floating point numbers are able to represent values that are not reals.

I would find it very surprising if this definition was ever inconsistent with Equatable. Is it fair to say that they will always be consistent when an Equatable conformance is provided?

Do you have any thoughts on how this would be defined for floating point types and operations? Would floating point == and < be considered pure?

It doesn’t seem like this approach is sufficient in order to suppor a third party data structure implemented from scratch using unsafe concepts. Wouldn’t the library author need to define ‘identical’ for this type? Wouldn’t we want this definition to be consistent with the definition applied by the language to fundamental types? Or did you mean that the language would also provide a clear definition of the semantics that third party libraries could also adhere to?

This makes sense as a starting point, but it obviously doesn’t address types which may use different internal representations to model the same value.

Implicitly relying on user’s definition(Equatable) would be dangerous, they can first-and-foremost be buggy, and user will likely not define it with this implication (this patch) in mind. While of course they should be able to explicitly opt-in.

Even if the Equatable is consistent with this propagated definition, wouldn’t it then be better to always use the latter?

It does so in a sense simply by treating them as different values.

This way of defining purity covers the case where the internal representation leaks some details (capacity, string encoding, sign of zero). It allows implementation of everything to be checked by the compiler (except when it comes to interacting with other languages). It's also all you need for correctness in multithreading (anything more would be unnecessary restrictive from this point of view).

That's obviously not the kind of pure you're looking for, but in the end I think it's the best solution. I wish our discussion was more filled with the use cases you want to enable. I realize now that all I can think of is a result cache that is "provably" correct. I use quotes because you still have to annotate all incidental members and you can't really prove they were annotated correctly since deciding what to make incidental is more a judgment call than a verifiable rule.

That said, I loved our exercice of going to the bottom of it. Perhaps that can be useful later.

Here is another way of looking at it. In an abstracted system, the definition of "equal" can change at each abstraction layer as details are either added or hidden. If a type allows extraction of implementation details, then multiple definitions of equality are potentially exposed. String is a great example. It is a simplifying abstraction of top of various Unicode primitives, which have independent equality rules. If we ban "pure" functions from getting at underlying abstractions in order to make "pure" hinge on "equality", then we'll have a fairly unrealistic and unworkable system.

2 Likes

A few thought I’d like to thow-in:

This proposal is useful even without notion of equality. It’d still be able to discern and enforce exclusivity (which from what I understand, was the original goal of this proposal).
To that end, classes that are created inside function call, and not escaping (, and doesn’t have/rely on any side-effect) would very much not raise function to reference semantic. This may even need ownership model, but it’d be interesting to see how these two notions interact.

I don’t understand how defaulting function to value semantic works. Doesn’t compiler still need to prove that function body doesn’t violate the contract? And shouldn’t any functions that don’t violate the contract automatically gain value semantic?

Any public-facing API should require explicit opt-in for value semantic. While any function provable to be value semantic, despite not being explicitly requested, should still benefit from this on whole-module optimization.

Strongly agree with everything Dave said here. The notion of substitutability only makes sense relative to a model, and in practice in computing there are almost always some operations that need to be placed "outside" the model in order for it to be useful. Strings and floating-point have obvious examples where it's "easy" to get outside the model that's most relevant to most users, but unsafe operations take you outside essentially any useful models for essentially all types.

There are couple of ways to look at this problem. If you can prove a class is uniquely referenced, then what is at the other end of the pointer might very well be considered as a value type.

Another is to prove the data in the class is immutable. Data that will never change is a constant, and therefore pure.

Then you can mix the two concepts with copy-on-write: the data is immutable if there is more than one reference to it (you have to make a copy to change it), or mutable if there is only one reference.

Possible compiler enforced COW based on "mutating" and "@pure"
class CopyOnWrite {
     @pure var pureValue: Int = 1
     @pure mutating func resetToZero() {
          guard pureValue == 0 else { return } // return early to avoid triggering a copy
          pureValue = 0 // check if uniquely referenced, if not automatically clone and reassign self
     }
}

obj.resetToZero() // might reassign obj because resetToZero is mutating

Lastly, there's the case where you use multiple references to the same object within your pure function but since references can't escape your pure outer function stays perfectly pure. The compiler can't optimize away repeated function calls because they have an effect, but since the effect can't escape into the wild it can be useful as a building block within a perfectly pure function. (They call that "weak pure" in D.)

Whatever you do however, if you allocate inside a pure function and take a look at the numeric value of the address for the allocated memory you might as well have a poor random number generator, which isn't very pure.

[EDIT] – Please note this patch is not proposing functional programming, therefore "strict" in this context does not referential transparency, etc.

This is confusing. I thought that referential transparency was the key property that we need for optimization. The proposal doesn't seem complete without it.

Whatever the definition of "equal" or "substitutable", it needs to legal for the compiler to do the substitution behind your back.

Structs/enums methods/getters/setters/etc. default to “value semantics”.

So, this is source breaking for any non-trivial struct or struct method that accesses globals right?

When you say "value semantics" do you always mean "strict value semantics"?

Global functions need default semantics. Source compatibility would dictate that reference semantics should be the default. I think a case could be made that value semantics should be the default if source compatibility doesn’t matter. This opinion is based on contemporary programming conventions around value types and reference types; and reasonable people may disagree.

I was very interested in defining value semantics and function effects prior to Swift 3 just to make sure we had the right defaults. Once it become clear that the defaults would allow shared mutable state, and there would be no way to reverse that decision, the urgency disappeared. It seems much too late to make the case for defaults now.

Since there's no longer any urgency in adding strict value semantics it seems prudent not to settle on a model without first designing some of the most important features that will require those semantics (I'm not arguing against adding some in-tree support, just saying I wouldn't want to prematurely commit to them.)

I'm not sure what the right behavior is for "no escape" closures. We might be able to allow mutable captures if we're careful. On the other hand, this makes "no escape" closures be semantically different than escaping closures, which increases the complexity of language in an unfortunate way.

No-escape closures do need to be able to mutate captured values without violating value semantics. We've design so much around closure-taking APIs that they need to be a zero-cost abstraction without any extra restrictions.

No-escape closures are definitely semantically different from @escaping closures. I think we should make the distinction much more clear throughout the implementation. Surfacing that distinction to users with diagnostics will help them write better code.

Inout parameters are reference semantic

I don't see why that would be the case, and I think making that claim will just confuse people about the semantics of inout. I don't think programming style evangelism needs to be mixed up in this if that's the motivation.

Hi Andy –

That "EDIT" was added earlier while I was trying to manage expectations. In particular, I did not set out to have a functional programming discussion and I was only trying to manage expectations accordingly. Additionally, I lack formal training in functional programming terminology, and I felt that I was getting mixed signals about what "referentially transparent" means in this thread. Since that "EDIT" and if I understand @Joe_Groff correctly, this patch might be close to an implementation of "pure" from a functional programming perspective, so I'm willing to learn more about functional programming to see if it is compatible with my goals.

I'm going to drop that idea in the next patch update. It was mostly there to smoke out people's opinions about how most structs/enums behave.

Ideally? No. I may have slipped here or there. I was trying not to conflate Swift's existing "by copy" value semantics with "strict value semantics" in my proposal because I assumed that people might be confused.

Well, I am working on a dependent feature (an atomicity/concurrency/reentrancy model), but I'm not ready to talk about it yet. Nevertheless, if strict value semantics is independently useful, so I don't see why it shouldn't go in ahead of other features.

I'm well aware of this perspective, hence my concern. What bothers me is that no-escape closures are – conceptually and visually – a form of reentrant programming, where the callee "reenters" the callers scope. That sure looks like reference semantics to me, and adding mutable captures only complicates the reentrancy.

Even if we set aside the mutable captures discussion, a no-escape closure that is callable by "pure" / strict value semantics code can't access reference types, so that means that captured class variables are useless (except for pointer comparisons), which might surprise users and force people to design code differently.

Fair enough. My fear is that if there is a hole in the design of this patch, it'll have to do with either explicit inout variables or implicit mutable captures.

I think the inout and no-escape closure questions are in principle the same. You could think of a no-escape closure that mutates as shorthand for passing the caller's variable inout. My point is that the compiler should have a clear model for that mutation--it should never appear as if the capture escapes. In other words, the compiler can always see where a no-escape closure is invoked or passed as an argument and for our purposes treat that like it's passing the captures inout.

With Swift 5, programmers shouldn't be able write code that acts "reentrantly". It could all be modeled as copy-in, copy-out, which I think is what you're after. Modulo these bugs:

[SR-8546]: Enforce @escaping for nested functions named inside conditional expressions.
[SR-8546] Enforce @escaping for for all but the most trivial uses of captured variables. · Issue #51064 · apple/swift · GitHub.

[SR-9043]: Missing checks of inout/mutating captures
[SR-9043] Missing checks of inout/mutating captures · Issue #51546 · apple/swift · GitHub.

It would probably be fair to say that implementation support for inout/no-escape is staged in, I just want to be clear about the goals.

@Andrew_Trick I have run into cases where I need to capture a closure in a struct that never escapes the stack and have used withoutActuallyEscaping to do that. How would use cases like that be handled?

FWIW, the use cases have primarily been in cases where the struct is necessary to emulate higher-rank types and would ideally not have been necessary at all.

I couldn't agree more. It is hard not to have that perspective after hacking on the code. :-)

Right. I remember talking about the copy-in/copy-out model during the design of Swift, but the fact that it hasn't ever been modeled this way in practice and the resulting bugs make me nervous. What's the plan?

There are two answers depending on what you're asking.

  1. When the withoutActuallyEscaping closure's scope ends, it checks the closure's reference count to ensure that it didn't actually escape. So you're not going to be able to access your stack allocated struct after it's freed, and value semantics can rely on that.

  2. Exclusivity enforcement ensures that your stack allocated variable cannot be "reentrantly" accessed...

It is hard to prove that the compiler catches all the possible violations involving withoutActuallyEscaping, but I'm claiming that it does! If you can find any holes in the diagnostics, please file bugs.

It's also hard for me to explain why it works. I'll try briefly...

Let's consider @escaping and @noescape to be part of the function type. Any closure can be passed as either type by upcasting to @escaping or downcasting to @noescape using withoutActuallyEscaping. So, the closure's function type is at any given point is independent of how exclusivity is enforced.

Instead, at the point the closure is defined, we need to decide whether it is obviously-never-escaping. SILGen currently makes an overly conservative guess. If this check fails, users aren't allowed to capture inout.

Later, the SIL compiler more aggressively determines if the closure is locally-never-escaping. As long as this is a superset of the obviously-never-escaping closures, the representation is still correct. The locally-never-escaping closures are statically checked for exclusivity. The rest will be emitted with dynamic checks whenever they access a captured variable.

So, I can define an obviously-never-escaping closure like this:

func definesClosure(arg: inout Int) {
  escapesClosure(&arg) { arg = 3 }
}

And actually escape it like this:

func takesEscaping(_ f: @escaping () -> ()) { f() }

func escapesClosure(_ x: inout Int, _ f: () -> ()) {
  withoutActuallyEscaping(f) {
    x = 4
    takesEscaping($0)
    x = 4 // compiler can delete this assignment
  }
}

But, since the closure locally never escaped, you will get a compile-time error:

./without_actually_escaping.swift:13:18: error: overlapping accesses to 'arg', but modification requires exclusive access; consider copying to a local variable
  escapesClosure(&arg) { arg = 3 }
                 ^~~~~~~~~
./without_actually_escaping.swift:13:40: note: conflicting access is here
  escapesClosure(&arg) { arg = 3 }
                      ~~~~~~~~~^~~
1 Like

Well, to me it depends on the level of semantics you care about. If you're asking about whether a variable is substitutable or can observe "reentrant mutation", then inout behaves just like copy-in, copy-out as long as exclusivity access is 100% fully enforced, which we now claim.
See

I'm working on writing up the current implementation status, and I'll try to remember to post a link here when its ready.

If you're talking about semantics at the level of what I would call the zero-cost abstraction guarantee, rather than correctness, then inout is actually prohibited from making a copy. Otherwise, isKnownUniquelyReferenced would never work. Keep in mind though, that It's actually illegal for your programs correctness to depend on the result of isKnownUniquelyReferenced.

I’m sorry that I’m repeating myself, but how does defaulting semantic here works? It sounds like function’s value/reference semantics are something compiler would safely check, and user can unsafely request. So defaulting struct/enum functions to value semantic and class functions to reference semantic baffles me, or am I misinterpreting the proposal?

Hi everybody,

I've created a new GitHub branch where I've refactored the code to default to source compatibility with @pure and @impure attributes. More work needs to be done. In particular, I'm trying to disentangle the standard library and mark code as @pure where possible. And I need to update the proposal.

@Joe_Groff, @Andrew_Trick, and friends – I'd appreciate feedback if you have the time.

After some more research, I think this feature/goal is no longer a prerequisite for other work that I'm doing and I'll no longer be updating the branch on GitHub.

I'll leave the branch there as a starting point for anybody interested in the work. Thank you all for the feedback, and if things change, then I'll revive the branch.

1 Like