[Pitch] Modify and read accessors

Ben_Cohen · November 1, 2024, 2:21pm

Please don't ask these kind of questions on pitch or review threads. They aren't relevant to the discussion, which should be focused on the language design.

duan · November 2, 2024, 1:17am

I put forward "edit" as a candidate alternative for modify:

it has the quality of describing an ongoing activity.
most importantly, it has 4 letters, same number as "read". With get/set being a pair of 3-letter words, and borrow/mutate being a pair of 6-letter words, we'd achieve ultimate harmony in modifiers (let's pretend the ones with the word "address" don't and won't exist!)

taylorswift · November 2, 2024, 4:00am

one thing we should think about is how we’re going to teach these features. of all of them, modify is probably the easiest to understand, it’s pretty straightforward to get a feel for when it is needed and when it is not, and develop an eye for what source patterns would be calling a modify.

the other accessors are harder to untangle and today we have

get
mutating get
consuming get

and with the proposal we would have

read
mutating read

for a total of five “immutable” flavors of accessor.

they’re not redundant, and they all have their use cases, but it would help if we had some examples for when to use each one. in particular, the differences between modify and mutating read could use some explanation as many implementations will look identical save some slightly different syntax.

nate_chandler · November 4, 2024, 3:53pm

The proposal includes using different symbols for read and modify from those used for _read and _modify. This allows changing how calls to such coroutines are made. A goal here is to eliminate most malloc calls.

Even with non-mallocing read and modify, such a blanket statement is still not the best advice. These accessors ought not become the first tool to be reached for. They are best suited to fields of noncopyable type and fields of especially-expensive-to-copy type (copy-on-write types like Dictionary, for example) when profiling indicates the usual accessors are a bottleneck.

fclout · November 5, 2024, 8:04am

When is get/set the better choice? What overhead do you pay/risk to pay to use read/modify?

dnadoba · November 5, 2024, 11:16pm

I'm really exited about this! I have used the underscored version since the very beginning to allow in-place mutations of CoW (Copy on Write) types, especially for generic wrappers that may or may not provide access through a computed property to a CoW type. This is an essential feature for creating data structures, generic wrapper and algorithms which have computed properties or subscripts where the return type is potentially a CoW type or any other type that is expensive to copy, even temporarily.

One question though. Will key paths be able to use these accessors?

jrose · November 6, 2024, 6:52am

I’m not totally up on the implementation, but some things I know:

Generally speaking, a (non-inlinable) coroutine call turns into two function calls (for accessors, anyway), which is slightly less efficient and slightly more code size. (Even if there’s nothing explicitly after the yield point, there could still be cleanup necessary of any locals.)
From the caller side, if you need an owned value anyway (because you plan to modify or consume it), you might as well use get; otherwise you’re either paying the code size for the copy call yourself, or you’re doing a redundant copy of a computed value anyway.
Similarly, if you are setting a value, you might not need to waste time materializing the old value first in order to use modify. And of course, computed properties might not be backed by storage that supports modify, in which case only providing modify would be inflicting the above pain on everybody.
read has law-of-exclusivity implications that get does not, since the access isn’t “instantaneous”.
And of course consuming get is still relevant for non-Copyable types, though you can always imitate that with consuming functions so maybe it’s not such a big deal.

There might be some I missed, or I might be over-worried about some I listed (most of the time they’re probably not that big a deal), but it can indeed be relevant.

vanvoorden · November 6, 2024, 7:50am

I feel like something super useful (maybe not directly from swift™… maybe a community repo) would be something like a "benchmarks playground" that actually lets engineers see for themselves how these accessors stack up against one another when different data structures are delivered.

arennow · November 6, 2024, 2:18pm

Those are all reasonable and important things, but I'm skeptical that real-world devs would both know these fairly minor, hard-to-see differences and assess them properly. My guess is that most people who are even aware of the new read accessor will use it for everything largely due to novelty or out of cargo-cult reasoning that "it's faster"

Shifting gears slightly, would it be possible (without backward-compatibility issues) to not have a distinct read block, but instead to use current (get/copying) behavior under all current circumstances, but use read behavior if the implementation of the get block uses the yield keyword? (And disallow both return and yield in the same implementation). And the same for set. E.g.:

var arr: Array<BigThing> {
    get {
        yield self._innerArray
    }
    set {
        yield &self._innerArray
    }
}

That way, we can:

Avoid having to bikeshed a new accessor name
Avoid the honestly very confusing state of having both get and read. (I realize that there are use-cases for having both, but I expect they could be handled heuristically with a compiler-inserted copy as needed

vanvoorden · November 8, 2024, 6:15pm

One more small idea here under "Alternatives considered" would be a potential short answer to the question "why won't-slash-can't the compiler do this for me?" given that we plan to present product engineers with something like a decision tree when "classic" accessors are still the preferred tool… product engineers might then want to know why isn't the compiler just formalizing this decision tree itself?

KeithBauerANZ · November 13, 2024, 10:53pm

Aside from the comment that read and get can't coexist in a protocol, I don't see any other discussion of how these interact with protocols. Can a protocol require modify? If so, can modify be witnessed by a get/set pair? Can we fill out this table?

       |                  protocol                ||   concrete type   |
       | incompatible with |    witnessable by    || incompatible with |
-------+-------------------+----------------------++-------------------+
get    | read              | stored, get, ...     || ?                 |
set    | modify?           | modify?              || ?                 |
read   | get               | stored, get, ...     || ?                 |
modify | set?              | stored, get+set, ... || ?                 |
...

KeithBauerANZ · November 13, 2024, 11:38pm

It feels like this kinda "changes the default" for protocols; where previously we might have written var t: T { get }, We should now basically always write var t: T { read } — it's potentially more performant, doesn't constrain us if T is Copyable, and is necessary if we want to allow T to be ~Copyable.

Likewise for var t: T { get set } we should probably always write var t: T { read modify set } now?

Likewise for any concrete generic type, it changes the default from providing a get/set property to providing a read/modify/set property, with potentially significant increases in implementation complexity?

That seems to "do quite a lot of damage" to existing Swift code? As in both, causes churn in mature codebases, and as in, makes the story for inexperienced developers that much harder?

Is there no path here where we keep get and set, but change the semantics to borrow by default (with implicit copying for Copyable types meaning that existing code continues to work), and allow the new accessor types only as optimizations, and never in protocol requirements?

tbkka · November 14, 2024, 12:37am

I just posted a proposed "Vision for Accessors in Swift" that should hopefully clarify how we envision the final complete set of accessors working.

Here's the

KeithBauerANZ · November 14, 2024, 1:06am

Thanks, this is super-helpful in understanding the proposal this thread is actually about, and I believe a significant portion of it should be copied to (or at least referenced by) the actual proposal

KeithBauerANZ · November 14, 2024, 1:13am

One more thought — the proposal doesn't seem to discuss how the scope of the coroutine is determined. The only example in the spec for modify is object.modifiable.append(...), a single expression calling a mutating method. What about:

let x = self.readable
x.foo()
x.bar()

Is x copied, or borrowed? If borrowed, what if I overlap borrows:

let x = self.readable1
let y = self.readable2
x.foo()
y.bar()

What if the expression is async? Is this legal?

await self.readable.longRunningSomething()

With modify, can I pass the borrowed value to an inout function, at least if the value is copyable?

file.readToEnd(into: &self.modifiable)

etc.

grynspan · November 14, 2024, 1:19am

My three cents:

1¢: read should be spelled some way that involves the word borrow so that it's clear there's a relationship between it and borrowing values. I see that the pitch doc talks about a future borrow accessor, although I'm not entirely sure how it would differ from read.

2¢: I'm fine with modify. We already have mutating and nonmutating in the language so maybe mutate is better?

3¢: I do understand yield and return are not equivalent, but I kind of wish we could just say return instead and have the compiler understand that in the context of read and modify, it does something different..

Extra bonus 4¢: I think I'd feel more warmly about yield if we were also getting other coroutine functionality from it, such as on-the-fly sequences à la C#, but I don't think I saw that as a future direction.

scanon · November 14, 2024, 1:37am

The lifetime of the value yielded from a read access is bounded by the access; when the access ends and the “bottom half” of the routine runs, its lifetime ends.

The lifetime of a borrow access can extend beyond that access; it would usually be tied to the lifetime of the object that provided it.

grynspan · November 14, 2024, 1:38am

I wonder if that could be expressed with @lifetime() along with borrowing get, or would that be too confusing?

KeithBauerANZ · November 14, 2024, 1:47am

Thinking about this some more, it says we need both borrow and read, because

borrow doesn't allow situations where the storage doesn't match the property type (eg. Dictionary's Optionals)
read doesn't work with noncopyable types, because the implicit coroutine must have a limited scope.

But I think the first limitation is actually lifted by nonescaping types? We've just accepted Span<T: ~Copyable>, which represents contiguous storage of many Ts, why not have Ref<T: ~Copyable>, essentially a 1-element Span?

Then we don't need read at all, and borrowing composes across types; rather than Dictionary having to return borrowed Value? aka Ref<Optional<Value>> which it can't because it doesn't store it, it can return Optional<Ref<Value>>, which trivially can exist.

This then extends to cover mutate by adding RefMut<T: ~Copyable>.

Some compiler magic to allow using Ref and RefMut values as if they're lvalues of the types they point to does the rest of making the syntax natural.

And read and modify just aren't needed any more?

(Yes, this is just Rust. But it seems better and way less confusing to have 2 new accessors that work with nonescapable types, than to have 4 that work with couroutines and implicitly-borrowed values?)

KeithBauerANZ · November 14, 2024, 3:04am

Expanding that to a matched pair of counterproposal sketches:

Nonescapable Reference Types

Add Reference<T: ~Copyable>: ~Escapable {} and MutableReference<T: ~Copyable>: ~Escapable {} types to the language. They have no API surface; you always interact with them via the language's usual syntax.
MutableReference<T> is a subtype of Reference<T> and upcasts silently
[Mutable]Reference<T> is a subtype of [Mutable]Reference<U> if T is a subtype of U, upcasts silently, and downcasts with as?
probably need to allow projecting stored properties out of references?
&lvalue syntax creates Reference/MutableReference as appropriate.
If T is Copyable
- A value of type Reference<T> can be used wherever a value of type T can (copied out if necessary).
- A value of type MutableReference<T>, can be used wherever an lvalue of type T can.
borrowing T parameters become syntax sugar for regular parameters of type Reference<T>
(we gain additional new functionality because now we have mutable borrowing parameters too)

Reference Accessors

(deliberately picking new keywords here to encourage discussing this without trampling over the previous mentions of borrow accessors, though I expect we'd choose borrow if this proposal were actually adopted)

Add reference and mutableReference accessors.
- var t: T { reference { ... } } returns Reference<T>
- var t: T { mutableReference { ... } } returns MutableReference<T>.
- providing reference synthesizes get for Copyable types
- providing mutableReference synthesizes get and set for Copyable types
- read-only stored properties provide get and reference
- mutable stored properties provide get, reference, set and mutableReference
- reference set is a legal combination

dict[key] // borrows value, returning Reference<Value>?, all good
dict[key] = nil // calls `set` with `Value?`, works fine
dict[key]?.append(3) // works for copyable types, but not for noncopyables
  // because we have to copy out of Reference<Value> to be able to call
  // mutating append.

// but, we can now
// assuming func slot(key: Key) -> MutableReference<Value>?
dict.slot(key)?.append(3) // in-place update of the value

// or if `reference mutableReference set` were allowable,
// with the compiler picking contextually between
// `mutableReference` and `set`
// I think we could make both work:
dict[key] = nil // calls `set` 'cos we're replacing the whole value
dict[key]?.append(3) // calls `mutableReference` 'cos we don't need the whole value

These proposals don't address the "damage to existing swift code" criticism; it effectively splits the ecosystem. A protocol using { reference mutableReference } for efficiency can't interoperate with a type with get set. On the other hand, it does encourage people to stick with get set unless they really need the efficiency or have noncopyables...