Strict Value Semantics

DaveZ · January 21, 2019, 1:08pm

Hello,

In my free time and as a part of a larger research project, I’ve been maintaining a proof-of-concept patch that implements strict value semantics. At this point in my project, I’d like to see what it would take to get strict value semantics integrated into mainline Swift.

What are strict value semantics?

Strict value semantics ensures that value types do not create or depend on side effects. This makes reasoning about the behavior of values easier, both for programmers and compiler optimizations. Strict value semantics also enables future language features. In my case, I’m researching a simple and efficient atomicity/concurrency/reentrancy model (but I’d like to not discuss it right now).

Casually speaking, strict value semantics means that value types “cannot” use reference semantics. Cannot is in quotes because reality is slightly complicated. For example, the standard library needs reference semantics to implement fundamental value types like collections. Also, value types continue use inout internally to model mutating functions.

[EDIT] – Please note this patch is not proposing functional programming, therefore "strict" in this context does not referential transparency, etc.

Where to get the patch

On GitHub. Please note that until the standard library is built with strict value semantics enabled, there isn't anything end users can do to play with the feature outside of running only the type checker on test examples.

Details of the patch in progress

Functions now have an additional dimension of formal behavior: value semantic or reference semantic. This allows the compiler to ensure that value semantic functions cannot call reference semantic functions. This is the essence of the patch.

Class methods/getters/setters/etc. default to “reference semantics”.
Structs/enums methods/getters/setters/etc. default to “value semantics”.
Protocols, by default, are semantic free. (See the next line.)
Protocols need a way to guarantee reference or value semantics, otherwise strict value types cannot use any existential types. I’ve gone with “protocol Foo : !class” to mean a value semantic protocol, but reasonable people might dislike the syntax.
Globals
1. Global variables are a form of reference semantics. In this implementation of strict value semantics, strict value types can access global constants (“let” variables), but not mutable variables.
2. Global functions need default semantics. Source compatibility would dictate that reference semantics should be the default. I think a case could be made that value semantics should be the default if source compatibility doesn’t matter. This opinion is based on contemporary programming conventions around value types and reference types; and reasonable people may disagree.
An attribute exists to override the default semantics of a given function. For example, global functions. Not all functions can have their semantics overridden. For example, strict value types cannot have reference semantic methods.
When it comes to function type conversion, value semantic functions implicitly convert to non-value-semantic functions (i.e. “reference semantics”). This may sound backwards, so think of it this way: a function that promises to never access reference semantics (i.e. a “strict value semantics” function) is convertible to function type with the same signature that don’t care (i.e. a “reference semantics” function).

Tradeoffs

Closures created by strict value types or passed to strict value types cannot mutate captured variables because mutable captures are reference semantic and the closure must be value semantic in order to be callable by the strict value type. This change is incompatible with some programming conventions, but there are workarounds.
1. I'm not sure what the right behavior is for "no escape" closures. We might be able to allow mutable captures if we're careful. On the other hand, this makes "no escape" closures be semantically different than escaping closures, which increases the complexity of language in an unfortunate way.
Inout parameters are reference semantic. While I assume that the exclusivity checker is bug free, a case could be made that inout is a ObjC++ compatibility feature and that ObjC++ doesn’t have strict value types. Therefore we could just not allow strict value types to vend or use explicit inout and therefore we could define away this case of reference semantics in a value semantic context. Also, one can use tuple return values instead of inout to mutate multiple values. For example, instead of x = foo(&y), one can write (x, y) = foo(y).
Generic functions cannot infer their type and therefore they cannot infer value/reference semantics. As a consequence, programmers will need to duplicate code that needs to work with both semantic worlds. That being said (and practically speaking), people probably won’t do this, and the natural divide between reference types and value types will become more clear.
What should be done about logging/debug APIs? Strictly speaking, these APIs create side effects and therefore strict value types cannot use these APIs. That being said and in practice, programs cannot rely on the side effects of logging/debug APIs and therefore I’d imagine that a new attribute could allow strict value types to call these kinds of APIs.
Similarly, APIs like “random()” and system calls are not usable by strict value types. I’d argue that’s a good thing, but it may surprise people.
Non-default property mutation. Mutating getters are getters with intentional side effects and therefore are incompatible with strict value semantics. Additionally, immutable setters are pointless no-ops under strict value semantics, because the setter cannot have side effects.

Work Remaining

Turn strict value semantic errors into warnings if strict value semantics aren’t enabled.
Add attribute to allow "safe/unobservable side effect" APIs (logging/debugging) to be callable from strict value types.
Audit existing SIL optimizations. In particular, it is really easy for the compiler to drop ExtInfo bits.
Add new SIL optimizations that take advantage of strict value semantics.
Compile the standard library with strict value semantics enabled.
Compile miscellaneous libraries included with the compiler with strict value semantics enabled.
Figure out the ObjC++ interoperability story. C structs do not have strict value semantics, so what should be done about C structs imports?
Figure out the source compatibility story. Should people need to opt into strict value semantics? This can work, but the programming experience is like “const correctness” all over again.

Finally, and I cannot stress this enough, I wholeheartedly believe that strict value semantics is a prerequisite to simple and efficient concurrency models. I hope that we can make this feature happen.

Cheers,
Dave

michelf · January 21, 2019, 2:00pm

I'm glad to see someone implementing these ideas.

This sounds similar to what I envisioned in Exploratory proposals: Pure & Concurrent. Except in my case I avoided completely the concept of a "value type".

I find it closer to reality to consider each type can sometime behave as a reference and at other times as a value. Case in point: a pointer can be treated as a value as long as you're not dereferencing it. An array of object references has value semantics if all you're doing is getting its count. What matters really is not the type or the data it stores, but whether what you do with it breaks value semantics or not.

George · January 21, 2019, 2:07pm

I’m interested in how this works overlaps with shared values (swift/OwnershipManifesto.md at main · apple/swift · GitHub) which are a Swifty re-imagining of Rust’s borrow checking mechanisms. They are substantively different from strict value semantics, but achieve similar results; namely, they protect from unexpected modifications to owned values.
Shared values also have the potential to unify the ideas of strict value semantics with the reference-based mechanisms which are used throughout the standard library (since a immutable share is essentially a strict value type).

DaveZ · January 21, 2019, 2:23pm

In this patch, that is what you get. Strict value types can pass/return/hold reference types but strict value types cannot call reference type members/getters/setters/etc. Thus strict value semantics are preserved, and popular generic value types continue to work (collections, Optional, etc).

DaveZ · January 21, 2019, 2:26pm

Unless I'm missing something, this patch is complementary to the "shared" parameter specifier proposal.

anandabits · January 21, 2019, 5:20pm

It's awesome to see that someone has been exploring this topic to the point of implementing a proof-of-concept! I spent quite a bit of time exploring how we could add language support for value semantics to Swift a couple years ago. As I'm sure you are well aware, there are a lot of subtleties involved in getting this right.

If you're interested in seeing the draft I had been working on you can find it here: ValueSemantics.md · GitHub. This draft has flaws (which is why it was never shared on SE), but may also have some points of interest.

What syntax do you propose for overriding the default semantics?

I think this is the right choice. In my experience, most Swift programmers don't understand all of the subtleties involved in value semantics and preserving referential transparency. By choosing this default programmers would be able to know that referential transparency is preserved if their value types compile without error or additional annotation.

However, there are plenty of use cases for value types that do not involve strict value semantics. I think we also need a way to flip the default for a specific value type. This would be a somewhat advanced feature, but without it implementing these value types would be more cumbersome than necessary. It would also be a huge aid in source migration - all existing value types would be migrated to not have strict value semantics. Users would opt-in to value semantics for their types at their leisure.

The way I had approached this was to require a type with strict value semantics to provide value-semantic implementations of all protocol requirements unless those requirements had specifically been annotated as not having value semantics (see my question above re: syntax for modifying the default).

As far as syntax, I would prefer to see a magic AnyValue protocol providing the value-semantic analogue of AnyObject. AnyValue would carry strict value semantics: value types without strict value semantics would not conform.

This is unnecessarily restrictive and would not support all of the APIs in the standard library, for example Int.random overloads that don't take an inout generator. Another good example from Foundation is Date(). I think it's find for a type with strict value semantics to have some APIs that do not have strict value semantics as long as those are explicitly annotated and differentiated in the type system.

How are you differentiating these syntactically? You haven't mentioned pure functions at all. Do you see any distinction between "value semantic" functions and "pure" functions?

FWIW, @Joe_Groff has in the past suggested we might introduce an => arrow with stricter semantics than the -> arrow we have today.

There has also been quite a bit of discussion about the fact that people seeking stricter semantics aren't always looking for the same thing. For example, data race free is not the same as referentially transparent. It may be useful to model these behaviors more generally as effects (or absence of effects).

This is not true. inout parameters of a type that has strict value semantics are simply an alternate way to return a result. They do not break referential transparency. Language support for value semantics must be able to understand this.

I don't follow what you mean here exactly. Are you talking about functions like map? What we need in that context is something like rethrows which says that map preserves value semantics / referential transparency, but that its semantics ultimately depend on those of the transform.

I believe a similar mechanism may be necessary for generic types. I view a type such as Array as preserving value semantics rather than having value semantics. The elements of the array are a fundamental part of the value an array represents and if they do not have value semantics the array doesn't really have value semantics either.

I'm skeptical of creating an escape hatch like this for specific APIs. However, I think it's unlikely that we will be able to add language support for value semantics and be able to implement foundational data structures like Array as a type with strict value semantics unless there is an escape hatch along the lines of Rust's unsafe.

Can you go into more detail on how you would recognize standard library collections as having value semantics in the presence of an implementation that allocates memory, uses CoW, etc? Identifying a model that works for sophisticated implementations of strict values such as these is difficult, yet essential. It isn't clear from what you've written thus far how that would work.

This view is true in some sense, but I also think from a pragmatic point of view it is extremely useful to acknowledge that many types have members that predominantly have strict value semantics. Saying Int is a value (i.e. has strict value semantics) is an extremely useful cognitive shorthand. Modeling this directly in the language would also help to highlight members which have semantics that are exceptions to the default semantics of the type (which is something that I think is pretty important).

Further, value semantics is about more than just referential transparency of operations. The state associated with a value matters.

A value should only store state that is essential to the representation of its meaning (as defined by equality). This may include state that is incidental to the meaning if that state is essential to the implementation, however incidental state must be handled carefully. For example, Array.capacity is not referentially transparent as it could have different results for equal arrays. We need to model types with strict value semantics in the language in a way that understands incidental members if we wish to have referentially transparent value semantics in Swift.

Beyond the issue of incidental state, if I store a value of a type with strict value semantics internally in a library I should not have to worry about unexpectedly retaining a handle to a resource, etc. If you say that types like Array unconditionally have strict value semantics then you are forced to deal with the concern that a so-called strict value may actually retain arbitrary resources.

DaveZ · January 22, 2019, 3:35am

At the moment, it's an attribute. I'm open to suggestions.

This was implied by some points that I was raising at the top of the thread, but let me say so more directly. Whether structs/enums default to strict value semantics is a policy decision. Whether an option exists to flip the default is a policy decision. For whatever it may be worth, I experimented with these policies during the evolution of this patch. Personally speaking, I think strict value semantics is the "right" default for a language that cares about having reasonable defaults (like Swift). However, making structs/enums have strict value semantics isn't source compatible and making strict value semantics be opt in is almost as bad as not having the feature (it's like C++ "const correctness" all over again).

Unless I'm missing something, that doesn't solve the problem I'm outlining above. When the compiler type checks the usage of existential types, it doesn't/can't know what the backing type is at compile time, therefore a semantic free protocol could be backed by a reference semantic type.

That seems like something people might want if this patch gains traction.

I think allowing static methods on strict value types to opt into reference semantics seems reasonable. This would, of course, make the static methods inaccessible to the strict value type itself. However, I don't think allowing instance methods on strict value types to have reference semantics makes any sense.

The patch doesn't expose this at the syntactic level at the moment. This is easily fixable (and it wasn't blocking my research).

Under this strict value semantic model, strict value type getters and immutable functions could and should be configured as "pure" functions that LLVM could then optimize accordingly after IRGen. However, optimizing strict value type setters and mutable instance methods will probably requires SIL optimization passes.

Cute, but that doesn't really help functions that don't use the arrow syntax. For example "func foo() {}" is completely valid and doesn't use ->.

As I mentioned at the start of this thread, this patch exists as a prerequisite to larger research I'm doing into an atomicity/concurrency/reentrancy model. While referential transparency is interesting, it seems separable from my research goals.

I know that is how inout might feel from a user experience perspective, but in practice, inout is way more complicated. Ensuring exclusive access requires static analysis and dynamic analysis checks rather than basic "type checking".

Unless I'm missing something, the "rethrows" feature works because it largely doesn't impact the type checker. Strict value semantics on the other hand dramatically changes whether a generic function type checks or not.

I don't think that is the right approach. Array and friends can and should be strict value types because they simply don't care about the semantics of the values stored therein.

The standard library is privileged and trusted. It can access the Builtin namespace, where all sorts of "raw" and "unsafe" APIs exist. This is how strict value types in the standard library can access reference semantics.

lukasa · January 22, 2019, 1:01pm

To be clear, you would be proposing that it is not possible to implement standard-library-like data structures outside the standard library. That is, the standard library would be privileged over third-party code. Is that correct?

DaveZ · January 22, 2019, 1:39pm

I'm not proposing any change here. The standard library has always been privileged over other libraries. And yes, some data types make efficiency promises that depend on raw memory or raw memory management, therefore the code must live in the standard library. This patch doesn't change that.

lukasa · January 22, 2019, 1:56pm

I agree that the standard library has always been privileged.

But I disagree with this.

One of the few ways in which the standard library is not privileged over other libraries is the ability to construct CoW container value types. This is entirely implementable outside of the standard library, and in fact regularly is. As an example, SwiftNIO implements ByteBuffer, a CoW byte storage object. This is used extensively, and the fact that ByteBuffer is a value provides a number of really pleasant programming conveniences for networked programming.

The "magic" by which you implement one of these things is extremely straightforward: a combination of a struct storing a class instance as a stored property, and diligent use of isKnownUniquelyReferenced. Doing this is very easy, and allows third-parties to build safe CoW data structures easily.

I think it's reasonable to say that the ability to provide safe CoW value types that provide this optimisation without merging everything into the standard library is extremely valuable. I think it's possible to argue the case that it is more valuable than strict value semantics. I would be pretty strongly opposed to any proposal that makes it impossible to write CoW value types outside of the standard library.

DevAndArtist · January 22, 2019, 2:03pm

I can only second this, I have written custom CoW types in the past and would be against a change that breaks my code and on top of that removes the ability to express the same functionality in a different way.

To be clear, I'm not against breaking changes, but I'm against removing the ability of writing CoW types outside the stdlib.

DaveZ · January 22, 2019, 2:05pm

lukasa:

DaveZ:

And yes, some data types make efficiency promises that depend on raw memory or raw memory management, therefore the code must live in the standard library. This patch doesn't change that.

But I disagree with this.

One of the few ways in which the standard library is not privileged over other libraries is the ability to construct CoW container value types. This is entirely implementable outside of the standard library, and in fact regularly is. As an example, SwiftNIO implements ByteBuffer , a CoW byte storage object. This is used extensively, and the fact that ByteBuffer is a value provides a number of really pleasant programming conveniences for networked programming.

The "magic" by which you implement one of these things is extremely straightforward: a combination of a struct storing a class instance as a stored property, and diligent use of isKnownUniquelyReferenced . Doing this is very easy, and allows third-parties to build safe CoW data structures easily.

I think it's reasonable to say that the ability to provide safe CoW value types that provide this optimisation without merging everything into the standard library is extremely valuable. I think it's possible to argue the case that it is more valuable than strict value semantics. I would be pretty strongly opposed to any proposal that makes it impossible to write CoW value types outside of the standard library.

I'd much rather see the compiler and/or stdlib vend an automatically managed CoW value buffer. This is independently useful, good, and compatible with strict value semantics. The fact that code outside of the standard library needs to know about, let alone call isKnownUniquelyReferenced is brittle and deeply unfortunate.

lukasa · January 22, 2019, 2:14pm

Sure, so would lots of people. Less work is almost always appealing!

As I said, I only object to a proposal that makes it impossible to build CoW value types. If your proposal includes a compiler enhancement to automatically build these data types, then I would not object in the slightest. I was only noting that any proposal that does not include solving this issue is highly unlikely to progress through swift-evolution.

michelf · January 22, 2019, 3:32pm

This is probably the most challenging part about designing the feature. Function annotations quickly becomes viral, and you can see this in D where there are plenty of those viral attributes for various effects. Take a look at this function prototype for instance:

const pure nothrow @nogc @safe int opCmp(Duration rhs);

Picking the right defaults for struct could make things simpler, but is source breaking. Maybe it wouldn't be so bad however. It's hard to gauge how severe the source break will be without an implementation to test with, so it's good that you're making one.

I've made a design for managed CoW in my explanatory proposal in case you want to take a look. But even then, I left an unsafe mechanism for breaking out of this, similar to how you can use unsafe pointers even outside of the standard library.

anandabits · January 22, 2019, 3:51pm

Personally, I think Joe's idea of piggybacking on throws and the proposed async as a general form of effect notation is a good one, as is his suggestion of introducing a new => arrow that has stricter effect semantics by default than -> does. This approach scales well to support finer-grained modeling of effects while still being convenient in the common case if we choose the semantics for the new arrow syntax carefully.

The point you're missing is that with the approach I suggested, users would be able to use protocol composition P & AnyValue with any protocol and immediately know that non-annotated requirements were required to be implemented with value semantics, just as we can do the same with AnyObject even when a protocol does not refine AnyObject.

I think value semantics vs reference semantics is the wrong way to look at this. The important distinction at the operation level is referential transparency IMO. See Collection.randomElement() for an example of an instance method that is not referentially transparent but is usefully available on types which may have strict value semantics.

I wasn't asking about LLVM here. I was asking in terms of the higher level semantics. Setters and other forms of mutation such as inout (including mutating which is just sugar for inout self) are semantically just another way to return a value.

What purpose does a non-mutating value-semantic function returning Void serve?

That's good to know. I agree that these are distinct topics. I hope as you move forward you try to leave the right space in the design to support referential transparency in Swift in the future.

I understand that the implementation is much more complex. I'm focused on user-facing semantics rather than implementation details.

I don't quite follow you here. throws is most definitely part of a function's type. The type of map itself is conditional depending on the transform provided. If the transform throws, so does map. If it doesn't then neither does map. I don't see why a value-semantic effect would behave any differently.

They don't, but their users most definitely do.

In the language as it exists today we can build our own value-semantic collection types using most of the same techniques (CoW buffers, etc) used by the standard library. IMO any proposal for adding support for value semantics to the language that cannot support such user-defined types is a non-starter. @lukasa has addressed this point in detail so I won't elaborate further.

This sounds like a good idea, although I am not convinced that it is sufficient to support everything third party libraries might need to do. Have you explored anything along the lines of Rust's unsafe escape hatch?

DaveZ · January 22, 2019, 9:23pm

Yes, syntax like throws and async feels right. I'm still skeptical about => though but that doesn't matter yet. If you don't mind, I'd appreciate it if we could defer syntax discussions until the fundamentals are discussed first. Specifically, is this soft proposal even something the core team is open to or interested in?

You can express AnyValue with this patch like so: protocol AnyValue : !class {}; and for whatever it may be worth, AnyObject used to be defined similarly until early 2017.

Also, please note that this patch verifies that protocol compositions and inheritance clauses are not in conflict, i.e. not simultaneously "class bound" (reference semantics) and "!class bound" (value semantics).

If it is truly important that randomElement() be expressed as an instance method instead of a generic function over collections, then sure, this patch can allow strict value type methods to opt out of the strict world. Of course, strict value semantics methods won't be able to call the method.

That's easy to say, but not at all how the Swift compiler and language are fundamentally designed. Setters return Void and mutating functions are allowed to do so as well. I'm not proposing any changes in this regard, nor do I need to for my research goals.

I agree that a Void returning immutable function under strict value semantics is a no-op. That being said, Void returning mutable functions should work, and without needing to write something like "=> Void" to get strict value semantics.

To be totally honest, this wasn't blocking my research, so I haven't spent much time looking into it. Perhaps it can work.

I think we might be talking past each other. Please let me use different words: there is no reason why the semantics of a collection type should change as a consequence of its elements unless the collection type has intrusive knowledge about the elements within. In other words, Array<T> can have strict value semantics because T is completely opaque. In contrast, Array<T : Foo> implies that Array wants to access members of Foo. If Foo has reference semantics, then Array must as well.

DevAndArtist · January 22, 2019, 9:28pm

Is that a typo? I'd assume you meant this instead:

protocol AnyValue: !class {}

Because in the current Swift you can think about AnyObject as something that could look like this:

typealias AnyObject = class

which would make your type:

typealias AnyValue = !class

DaveZ · January 22, 2019, 9:29pm

Yes. Fixed. Thanks

DevAndArtist · January 22, 2019, 9:52pm

Take it with a grain of salt, but I don't believe that AnyValue should imply value semantics, because any AnyObject can potentially also have value semantics (e.g. immutable classes), which would ultimately mean you could have a class that is both AnyObject and AnyValue, which from my point of view is pure nonsense. Viewing AnyValue as a type that is !class (not a class / not an object) and which can conform to protocols is the constraint I'm looking for here. That said, in the current Swift all structs/enums would implicitly be subtypes of AnyValue, but classes would not. If you would like to express value semantics as a type I would't recommend mixing it with AnyValue.

anandabits · January 22, 2019, 10:53pm

michelf:

This is probably the most challenging part about designing the feature. Function annotations quickly becomes viral, and you can see this in D where there are plenty of those viral attributes for various effects. Take a look at this function prototype for instance:
const pure nothrow @nogc @safe int opCmp(Duration rhs);

Thanks for posting this example. This demonstrates one of the advantages of building on the effect typing notation. One of the other ideas that has been discussed is that of effectalias and associatedeffect which would help keep the wordiness under control as well as support abstraction over effects.

That's totally fine with me. FWIW, I'm not interested in the specifics of syntax as much as its scalability (but also avoiding keyword soup as @michelf showed from D).

Sure, but that doesn't have the compositionally that I mentioned if protocols that aren't declared with !class are able to be conformed to by strict value-semantic types with implementation that don't have strict value semantics.

I think we need to assume standard library APIs like that won't change. Banning them from being called by strict value-semantic methods is a feature!

I didn't mean to imply changes to this either. I'm only talking about the fact that all three statements at the end of this example are equivalent in terms of user-facing semantics:

struct Foo {
   var name = "Nobody"
   var age = 42

   mutating func setName(_ newName: String) {
       name = newName
   }
}
var x = Foo()
x.name = "Bob"
x.setName("Bob")
x = Foo(name: "Bob", age: x.age)

I agree with this.

I think this comes down to precisely defining terms and goals. IMO, the elements stored in an array are an essential part of its value and therefore the value represented by the array inherently depends on the value represented by its elements. This is why I think of Array as preserving value semantics while not necessarily having value semantics.

However, this hinges on exactly what "value semantics" means. It sounds like your goals are more in the direction of "data race free" or something like which is not how I would choose to define "strict value semantics". I would require referential transparency and a clear delineation of essential vs incidental members of the type.

So I think you're right that we're talking past each other. Using your definition and goals it may well make sense to consider arrays to have strict value semantics, while using my definition and goals it would not.