How would you later rename read
to align it as well? I personally like the clearance of the modify
keyword.
Apparently that is a bug & it needs fixing: [SR-1437] defer block in init triggers property observers Ā· Issue #44046 Ā· apple/swift Ā· GitHub
Seems useful. Some discussion in the pitch around multithreaded code would be helpful. I.e., What happens if another thread accessed the property while a yield is pending? How can/will the compiler help? How will we debug these issues if the compiler canāt catch them statically? and so on.
I concur that yield made me think of multithreaded code and felt unfamiliar, though thatās not a blocking concern for me.
Thanks for the detailed description, it's always entertaining to read your explanations.
I think this is a nice low level feature to have, but I'm seeing myself in the future writing boilerplate modify
implementations just to avoid the performance penalty.
Can't the compiler detect this get
-modify
-set
pattern and optimize it for me? I guess it would be possible with value types. What about annotating the getter with an @modifiable
to generate a yield reference only implementation of modify
?
About yield
, it has a place in my mind for generators like the for
loop on the example. Not sure if it's the right word. Maybe borrow
?
Could someone kindly define what a ācoroutineā actually is in this context? Itās a term Iāve heard before but never really knew what it meant. (The syntax / behaviour reminds me of Python generators, and I couldnāt nail down what a coroutine is in general via Google either)
As far as I understand coroutine is a function that has one or more suspension points. Different from a normal function, which is called, then initializes the local state, executes and terminates. It will have a start, and at each suspension point the function saves the local state (value of variables declared in the scope of that function) and gives the control back to the caller, remaining in that suspended state until it is resumed. This happens for all suspension points until the function ends, i.e. there are no more suspension points. So in this context a yield is a suspension point where the coroutine would suspend and the caller will continue execution until the coroutine be called again which is going to resume it from the point where it was suspended until there no more suspension points and the coroutine terminates.
It's a complicated question, because people mean different things by "coroutine".
When a function is running, we say there's an execution of that function. That execution can be suspended (in which case it can later be resumed) or ended (in which case that's it, nothing more can happen with that execution).
The most basic model of functions is that they can only suspend their executions to start new executions (a call), and they can only end their executions by resuming the execution that started them (a return). Both of these can include transfers of data, called arguments and results respectively. Note that this creates a simple nesting effect where (1) every execution (except the first) was started by another execution (a caller) that is currently suspended and (2) an execution can only be resumed when every execution that it started (its callees) has ended, as well as every execution those executions may have started transitively. This permits the basic implementation concept of a call stack.
A common extension is to add a second, special kind of resumption: an execution can declare that it is currently resumable in a special way, and a function can end by either resuming its caller in this special way or (if it is not so-resumable) ending it and continuing on to its caller's caller. This is called catch and throw, and when it transfers data, that data is called an exception. This is still compatible with a call stack because it preserves the basic rule that executions are only resumed when they have no callees.
All of that should be familiar; I've covered it just to make you comfortable with using these more formal concepts of an execution, suspension, etc.
A coroutine is a function that can suspend itself to resume an existing execution. So you call a function, it runs for a bit, it resumes back to you, you run for a bit, you resume it, and so on. This one formal concept can be used this to describe generators, cooperative threads, and a number of other things.
Coroutines are not directly compatible with a single call stack and so need a different implementation. One implementation is to give every coroutine its own call stack, but this can be expensive if call stacks need to be pre-allocated, which is typical. Some implementations address this in a targeted way by allowing stacks to dynamically grow, usually at the cost of some extra overhead per call; I believe this is how goroutines are implemented. However, a very common alternative is to allocate space for the execution "elsewhere", in memory that can be reserved for the duration of the coroutine's execution. Often this is combined with a transformation that splits the coroutine function up into separate sub-functions, each of which picks up the coroutine's execution starting from some suspension point (or its beginning) and runs until the coroutine suspends itself again (or it ends).
That splitting is such a well-known implementation technique that a function subject to it is often called a "coroutine" even if it's not really behaving formally like a general coroutine at all. For example, async functions suspend themselves only to perform normal calls or to wait for other async functions to complete, which is essentially just the ordinary function model; however, because async functions are usually subjected to splitting at the implementation level (in order to avoid occupying a call stack long-term), they're often called coroutines anyway. (There is one other way that async functions aren't quite like ordinary routines: a function can start an async execution without suspending itself.)
Coroutines are really more a language concept than a language feature. The most common language features built on coroutines as a concept (rather than just on the implementation technique of function-splitting) are examples of a specific kind of coroutine called a semicoroutine. A semicoroutine is a coroutine that can only suspend itself to resume the last execution that resumed it; this is usually called a yield. (This works well with function splitting. Each resumption just becomes an ordinary function call to the next sub-function, passing the execution record as an argument. That sub-function then returns when it wants to suspend or end the coroutine, somehow reporting back whether the coroutine is still active and (if so) what sub-function should be called next.) The modify
accessor is a semicoroutine that can only yield once and then must end when resumed. A generator is a semicoroutine that can yield an arbitrary number of times.
You present a lot of good options. The main drawback of them is that they salt the simple case (which is expected to be the vast majority of cases) where cleanup is either non-existent (i.e. you just yield access to storage and youāre done) or not necessary (as in the non-unsafe implementation of first
).
In the clear majority of uses (I think, anyway...) , the yield &something
will be the last line. To force someone to write _ = yield &something
or guard yield &something else { return }
would be a real shame. Maybe we could carve out exceptions for when itās the last line of the modify
. How would this extend to generators though? A clean syntax for basic modifiers and generators is really important (unsafe ones are an expert feature but hopefully regular modifiers and generators wonāt be), so keeping the syntax clean needs to weigh against the added safety for the 1% case.
How do you think this?
func modify<T>(
_ value: inout T,
aborted: (() -> Void)? = nil
)
(I know that modify
is not a real function and it is special language thing.)
It can tell people what there is special abortion control flow.
It can also be ignored easily until people needs.
What if we took inspiration from do-try-catch blocks by creating do-yield-finally blocks:
- In the simple case, you can have a naked
yield
in the last line of themodify
implementation - If you want to do something else after yielding, then
yield
must be inside ado
block that's followed by afinally
block. - You can put stuff after
yield
inside thedo
block, which only executes if the yielded-to function doesn't throw - You can put stuff inside
finally
, which is executed whether or not the yielded-to function throws - (As @beccadax enumerates for his proposed syntax, you cannot
yield
,throw
, or pull other control flow shenanigans insidefinally
.)
I like @xwuās solution a lot, but I donāt think we should introduce a āfinallyā keyword. Swift already uses ādeferā instead of āfinallyā for error handling, and I think it would be confusing to have the two similar keywords have different meanings in Swift. What if we adopted this solution, but required the user write ātry yieldā when yielding inside a do block, and make it an error for that do block to have a following clause. This would indicate to users there might be some unusual control flow, while also pointing them towards the solution they would reach for when cleaning up after errors, ādeferā.
This would look like
do {
try yield &x
}
Hopefully indicating defer should be used if unconditional cleanup is desired without needing to add a finally keyword.
One argument against this is that yielding is not really a throwing operation in a formal sense, but I donāt think it would be all that confusing.
Overall though, Iām a very strong +1 on the pitch overall. I think the exact cleanup syntax we use is relatively minor issue.
I agree we shouldn't introduce a construct redundant with defer
only to support throwing yield
. Have you thought of just requiring the yield
statement to be in the function-return position, i.e. no further logic could be ever written below it? For example, the proposal's example would need to be written as
modify {
print("Yielding",x)
defer { print("Post yield",x) }
yield &x
}
as placing that second print
call in the last line would now be a compiler error. But now defer
would be the expected, and only, way to place logic after the defer
.
yield can be call an arbitrator number of times in a generator. While it is called only once in the mutate
case, denying code after yield would be pure madness:
fun generator() {
print("Yielding",x)
defer {
defer { cleanup code }
yield &y
}
yield &x
}
As with @johannesweiss, Iām strongly +1 on this proposal. I accept all its sharp edges and Iām willing to pay for them, because I honestly think the perfect may be the enemy of the good here. Iām open to seeing the language evolve in this space over time regarding cleanup, but I agree with @Ben_Cohen that the overwhelming majority of _modify
accessors will be entirely trivial. There are a number of places in the SwiftNIO ecosystem where we have worked around the absence of this feature, and every single one of them is a trivial case.
Iām +1 on this, and Iām looking forward to coroutines being introduced in Swift in general.
Having used coroutines in C#, I think the yield
keyword is good as pitched, and itās interaction with defer
is, even if not completely obvious, just the natural way that these two features work together. Most of the alternatives suggested in this thread seem worse to me, as they introduce special cases instead of composing two orthogonal features.
That said, I think the proposal could do a better job of introducing coroutines and the overall vision for coroutines in Swift. @John_McCallās explanation above is great ā perhaps something like it could be incorporated somehow? And perhaps a ācoroutine manifestoā would be good (unless it already exists somewhere)?
Thank you :)
What if the compiler simply rejects any code after yield
, explaining that you have to use defer
for any cleanup? It could do that in modify
/read
accessors as well as in generators.
This would allow āsimpleā accessors and generators to remain simple and donāt have to include any error handling syntax whatsoever. Just as in the examples of the OP.
If you need to do anything after the yield
, youād have to do it in the safe way:
func generateMutably2() {
for i in 0..<count {
yield &self[i]
print("Did yield", i) // Error
}
}
func generateMutably2() {
for i in 0..<count {
defer { print("Did yield", i) } // OK
yield &self[i]
}
}
func generateMutably3() {
for i in 0..<count {
yield &self[i]
}
print("Finished loop") // Error
}
func generateMutably3() {
defer { print("Finished loop") } // OK
for i in 0..<count {
yield &self[i]
}
}
Although that doesnāt address the ugliness of of yield
ing multiple times without a loop. I havenāt used other languages that use yield
, so I donāt know how common it is to write generators that do this.
I somehow forgot to write the other half of my argument Here it goes:
What if we combine the āno code after yieldā approach with the āyield returns success/failure Bool
ā but make that error handling optional? This would mean that: (a) a normal yield
must be the last code executed and no cleanup code could be skipped, or (b) the error must explicitly be handled and cleanup code can exist where it makes sense.
Iām using a yield ā¦ else { ā¦ }
straw man syntax here thatās pretty much the same as guard ā¦ else { ā¦ }
. I didnāt put any thought into whether it makes more sense to do it this way or to make yield
have a āreturn valueā that you can use in if
/guard
.
var string1: String {
modify {
yield &x else {
print("failed")
return
}
print("success") // OK
}
}
var string2: String {
modify {
yield &x // OK
}
}
var string3: String {
modify {
yield &x
print("did yield") // Error: Use 'yield/else' if you need cleanup code.
}
}
func generateMutably4() {
for i in 0..<count {
yield &self[i] else { return }
print("Did yield", i) // OK
}
}
func generateMutably5() {
for i in 0..<count {
yield &self[i] else { return }
}
print("Finished loop") // OK
}
func generateMutably6() {
for i in 0..<count {
yield &self[i] // OK
}
}
It is something done in most languages supporting generator (if not all), as the is the base to create a genarator wrapping 2 or more generators.
fun join(list1: Collection, list2: Collection) {
for (item in list1) { yield item }
for (item in list2) { yield item }
}
// Or in python:
def join(list1, list2):
yield from list1
yield from list2
I might be completely off here and suggesting something impossible, but to me it seems logical to model it similarly to passing in a throwing closure as an argument:
// Modifier that does not support throwing
var property: String {
modify {
yield &x
}
}
// Modifier that supports throwing
var property: String {
modify rethrows {
try yield &x
}
}
Now, if someone writes a modify that does not support throwing, the compiler can generate an error when it's used with a throwing call. This makes it clear that more work is required to properly support throwing, which in trivial cases is just a matter of adding two keywords. A big advantage is that it makes you think about it as soon as you need it.
Perhaps 'rethrows' is technically not correct here since the modify body does not participate in the error handling, but the only semantic difference seems to be that the error cannot be caught in the body of the modify.