Modify Accessors

DevAndArtist · December 19, 2019, 9:35pm

How would you later rename read to align it as well? I personally like the clearance of the modify keyword.

suyashsrijan · December 19, 2019, 9:45pm

Apparently that is a bug & it needs fixing: [SR-1437] defer block in init triggers property observers · Issue #44046 · apple/swift · GitHub

haikuty · December 19, 2019, 11:38pm

Seems useful. Some discussion in the pitch around multithreaded code would be helpful. I.e., What happens if another thread accessed the property while a yield is pending? How can/will the compiler help? How will we debug these issues if the compiler can’t catch them statically? and so on.

I concur that yield made me think of multithreaded code and felt unfamiliar, though that’s not a blocking concern for me.

Bruno · December 19, 2019, 11:38pm

Thanks for the detailed description, it's always entertaining to read your explanations.

I think this is a nice low level feature to have, but I'm seeing myself in the future writing boilerplate modify implementations just to avoid the performance penalty.

Can't the compiler detect this get-modify-set pattern and optimize it for me? I guess it would be possible with value types. What about annotating the getter with an @modifiable to generate a yield reference only implementation of modify?

About yield, it has a place in my mind for generators like the for loop on the example. Not sure if it's the right word. Maybe borrow?

bzamayo · December 20, 2019, 12:08am

Could someone kindly define what a ‘coroutine’ actually is in this context? It’s a term I’ve heard before but never really knew what it meant. (The syntax / behaviour reminds me of Python generators, and I couldn’t nail down what a coroutine is in general via Google either)

LucianoPAlmeida · December 20, 2019, 1:25am

As far as I understand coroutine is a function that has one or more suspension points. Different from a normal function, which is called, then initializes the local state, executes and terminates. It will have a start, and at each suspension point the function saves the local state (value of variables declared in the scope of that function) and gives the control back to the caller, remaining in that suspended state until it is resumed. This happens for all suspension points until the function ends, i.e. there are no more suspension points. So in this context a yield is a suspension point where the coroutine would suspend and the caller will continue execution until the coroutine be called again which is going to resume it from the point where it was suspended until there no more suspension points and the coroutine terminates.

John_McCall · December 20, 2019, 1:56am

It's a complicated question, because people mean different things by "coroutine".

When a function is running, we say there's an execution of that function. That execution can be suspended (in which case it can later be resumed) or ended (in which case that's it, nothing more can happen with that execution).

The most basic model of functions is that they can only suspend their executions to start new executions (a call), and they can only end their executions by resuming the execution that started them (a return). Both of these can include transfers of data, called arguments and results respectively. Note that this creates a simple nesting effect where (1) every execution (except the first) was started by another execution (a caller) that is currently suspended and (2) an execution can only be resumed when every execution that it started (its callees) has ended, as well as every execution those executions may have started transitively. This permits the basic implementation concept of a call stack.

A common extension is to add a second, special kind of resumption: an execution can declare that it is currently resumable in a special way, and a function can end by either resuming its caller in this special way or (if it is not so-resumable) ending it and continuing on to its caller's caller. This is called catch and throw, and when it transfers data, that data is called an exception. This is still compatible with a call stack because it preserves the basic rule that executions are only resumed when they have no callees.

All of that should be familiar; I've covered it just to make you comfortable with using these more formal concepts of an execution, suspension, etc.

A coroutine is a function that can suspend itself to resume an existing execution. So you call a function, it runs for a bit, it resumes back to you, you run for a bit, you resume it, and so on. This one formal concept can be used this to describe generators, cooperative threads, and a number of other things.

Coroutines are not directly compatible with a single call stack and so need a different implementation. One implementation is to give every coroutine its own call stack, but this can be expensive if call stacks need to be pre-allocated, which is typical. Some implementations address this in a targeted way by allowing stacks to dynamically grow, usually at the cost of some extra overhead per call; I believe this is how goroutines are implemented. However, a very common alternative is to allocate space for the execution "elsewhere", in memory that can be reserved for the duration of the coroutine's execution. Often this is combined with a transformation that splits the coroutine function up into separate sub-functions, each of which picks up the coroutine's execution starting from some suspension point (or its beginning) and runs until the coroutine suspends itself again (or it ends).

That splitting is such a well-known implementation technique that a function subject to it is often called a "coroutine" even if it's not really behaving formally like a general coroutine at all. For example, async functions suspend themselves only to perform normal calls or to wait for other async functions to complete, which is essentially just the ordinary function model; however, because async functions are usually subjected to splitting at the implementation level (in order to avoid occupying a call stack long-term), they're often called coroutines anyway. (There is one other way that async functions aren't quite like ordinary routines: a function can start an async execution without suspending itself.)

Coroutines are really more a language concept than a language feature. The most common language features built on coroutines as a concept (rather than just on the implementation technique of function-splitting) are examples of a specific kind of coroutine called a semicoroutine. A semicoroutine is a coroutine that can only suspend itself to resume the last execution that resumed it; this is usually called a yield. (This works well with function splitting. Each resumption just becomes an ordinary function call to the next sub-function, passing the execution record as an argument. That sub-function then returns when it wants to suspend or end the coroutine, somehow reporting back whether the coroutine is still active and (if so) what sub-function should be called next.) The modify accessor is a semicoroutine that can only yield once and then must end when resumed. A generator is a semicoroutine that can yield an arbitrary number of times.

Ben_Cohen · December 20, 2019, 1:59am

You present a lot of good options. The main drawback of them is that they salt the simple case (which is expected to be the vast majority of cases) where cleanup is either non-existent (i.e. you just yield access to storage and you’re done) or not necessary (as in the non-unsafe implementation of first).

In the clear majority of uses (I think, anyway...) , the yield &something will be the last line. To force someone to write _ = yield &something or guard yield &something else { return } would be a real shame. Maybe we could carve out exceptions for when it’s the last line of the modify. How would this extend to generators though? A clean syntax for basic modifiers and generators is really important (unsafe ones are an expert feature but hopefully regular modifiers and generators won’t be), so keeping the syntax clean needs to weigh against the added safety for the 1% case.

omochimetaru · December 20, 2019, 2:14am

How do you think this?

func modify<T>(
    _ value: inout T,
    aborted: (() -> Void)? = nil
)

(I know that modify is not a real function and it is special language thing.)

It can tell people what there is special abortion control flow.
It can also be ignored easily until people needs.

xwu · December 20, 2019, 2:40am

Ben_Cohen:

You present a lot of good options. The main drawback of them is that they salt the simple case (which is expected to be the vast majority of cases) where cleanup is either non-existent (i.e. you just yield access to storage and you’re done) or not necessary (as in the non-unsafe implementation of first ).

In the clear majority of uses (I think, anyway...) , the yield &something will be the last line. To force someone to write _ = yield &something or guard yield &something else { return } would be a real shame. Maybe we could carve out exceptions for when it’s the last line of the modify . How would this extend to generators though? A clean syntax for basic modifiers and generators is really important (unsafe ones are an expert feature but hopefully regular modifiers and generators won’t be), so keeping the syntax clean needs to weigh against the added safety for the 1% case.

What if we took inspiration from do-try-catch blocks by creating do-yield-finally blocks:

In the simple case, you can have a naked yield in the last line of the modify implementation
If you want to do something else after yielding, then yield must be inside a do block that's followed by a finally block.
You can put stuff after yield inside the do block, which only executes if the yielded-to function doesn't throw
You can put stuff inside finally, which is executed whether or not the yielded-to function throws
(As @beccadax enumerates for his proposed syntax, you cannot yield, throw, or pull other control flow shenanigans inside finally.)

owenv · December 20, 2019, 3:21am

I like @xwu’s solution a lot, but I don’t think we should introduce a ‘finally’ keyword. Swift already uses ‘defer’ instead of ‘finally’ for error handling, and I think it would be confusing to have the two similar keywords have different meanings in Swift. What if we adopted this solution, but required the user write ‘try yield’ when yielding inside a do block, and make it an error for that do block to have a following clause. This would indicate to users there might be some unusual control flow, while also pointing them towards the solution they would reach for when cleaning up after errors, ‘defer’.

This would look like

do {
 try yield &x
}

Hopefully indicating defer should be used if unconditional cleanup is desired without needing to add a finally keyword.

One argument against this is that yielding is not really a throwing operation in a formal sense, but I don’t think it would be all that confusing.

Overall though, I’m a very strong +1 on the pitch overall. I think the exact cleanup syntax we use is relatively minor issue.

pyrtsa · December 20, 2019, 6:42am

I agree we shouldn't introduce a construct redundant with defer only to support throwing yield. Have you thought of just requiring the yield statement to be in the function-return position, i.e. no further logic could be ever written below it? For example, the proposal's example would need to be written as

    modify {
      print("Yielding",x)
      defer { print("Post yield",x) }
      yield &x
    }

as placing that second print call in the last line would now be a compiler error. But now defer would be the expected, and only, way to place logic after the defer.

Jean-Daniel · December 20, 2019, 7:07am

yield can be call an arbitrator number of times in a generator. While it is called only once in the mutate case, denying code after yield would be pure madness:

fun generator() {
  print("Yielding",x)
  defer { 
       defer { cleanup code }
       yield &y
  }
  yield &x
}

lukasa · December 20, 2019, 9:15am

As with @johannesweiss, I’m strongly +1 on this proposal. I accept all its sharp edges and I’m willing to pay for them, because I honestly think the perfect may be the enemy of the good here. I’m open to seeing the language evolve in this space over time regarding cleanup, but I agree with @Ben_Cohen that the overwhelming majority of _modify accessors will be entirely trivial. There are a number of places in the SwiftNIO ecosystem where we have worked around the absence of this feature, and every single one of them is a trivial case.

hisekaldma · December 20, 2019, 9:55am

I’m +1 on this, and I’m looking forward to coroutines being introduced in Swift in general.

Having used coroutines in C#, I think the yield keyword is good as pitched, and it’s interaction with defer is, even if not completely obvious, just the natural way that these two features work together. Most of the alternatives suggested in this thread seem worse to me, as they introduce special cases instead of composing two orthogonal features.

That said, I think the proposal could do a better job of introducing coroutines and the overall vision for coroutines in Swift. @John_McCall’s explanation above is great – perhaps something like it could be incorporated somehow? And perhaps a ”coroutine manifesto” would be good (unless it already exists somewhere)?

bzamayo · December 20, 2019, 10:48am

Thank you :)

marco.masser · December 20, 2019, 10:50am

What if the compiler simply rejects any code after yield, explaining that you have to use defer for any cleanup? It could do that in modify/read accessors as well as in generators.

This would allow “simple” accessors and generators to remain simple and don’t have to include any error handling syntax whatsoever. Just as in the examples of the OP.

If you need to do anything after the yield, you’d have to do it in the safe way:

func generateMutably2() {
  for i in 0..<count { 
    yield &self[i]
    print("Did yield", i) // Error
  }
}

func generateMutably2() {
  for i in 0..<count {
    defer { print("Did yield", i) } // OK
    yield &self[i]
  }
}

func generateMutably3() {
  for i in 0..<count { 
    yield &self[i]
  }
  print("Finished loop") // Error
}

func generateMutably3() {
  defer { print("Finished loop") } // OK
  for i in 0..<count { 
    yield &self[i]
  }
}

Although that doesn’t address the ugliness of of yielding multiple times without a loop. I haven’t used other languages that use yield, so I don’t know how common it is to write generators that do this.

marco.masser · December 20, 2019, 11:31am

I somehow forgot to write the other half of my argument Here it goes:

What if we combine the “no code after yield” approach with the “yield returns success/failure Bool” but make that error handling optional? This would mean that: (a) a normal yield must be the last code executed and no cleanup code could be skipped, or (b) the error must explicitly be handled and cleanup code can exist where it makes sense.

I’m using a yield … else { … } straw man syntax here that’s pretty much the same as guard … else { … }. I didn’t put any thought into whether it makes more sense to do it this way or to make yield have a “return value” that you can use in if/guard.

var string1: String {
  modify {
    yield &x else { 
      print("failed")
      return
    }
    print("success") // OK
  }
}

var string2:  String {
  modify {
    yield &x // OK
  }
}

var string3: String {
  modify {
    yield &x
    print("did yield") // Error: Use 'yield/else' if you need cleanup code.
  }
}

func generateMutably4() {
  for i in 0..<count { 
    yield &self[i] else { return }
    print("Did yield", i) // OK
  }
}

func generateMutably5() {
  for i in 0..<count { 
    yield &self[i] else { return }
  }
  print("Finished loop") // OK
}

func generateMutably6() {
  for i in 0..<count { 
    yield &self[i] // OK
  }
}

Jean-Daniel · December 20, 2019, 1:06pm

It is something done in most languages supporting generator (if not all), as the is the base to create a genarator wrapping 2 or more generators.

fun join(list1: Collection, list2: Collection) {
  for (item in list1) { yield item }
  for (item in list2) { yield item }
}

// Or in python:
def join(list1, list2):
  yield from list1
  yield from list2

orobio · December 20, 2019, 1:07pm

I might be completely off here and suggesting something impossible, but to me it seems logical to model it similarly to passing in a throwing closure as an argument:

// Modifier that does not support throwing
var property: String {
  modify {
    yield &x
  }
}

// Modifier that supports throwing
var property: String {
  modify rethrows {
    try yield &x
  }
}

Now, if someone writes a modify that does not support throwing, the compiler can generate an error when it's used with a throwing call. This makes it clear that more work is required to properly support throwing, which in trivial cases is just a matter of adding two keywords. A big advantage is that it makes you think about it as soon as you need it.

Perhaps 'rethrows' is technically not correct here since the modify body does not participate in the error handling, but the only semantic difference seems to be that the error cannot be caught in the body of the modify.