Clarifying the semantics of abnormally terminating a storage access

First, thank you for taking the time to explain these concepts. It is much appreciated :slight_smile:

It strikes me that part of the problem here might be extending from that "if the storage isn't simply stored" clause, which is making these things behave differently based on implementation.

What if we say that that isn't an optimization we can make when there is a throwing function inside a function (e.g. it actually marks something with try)? In other words, we still use a temporary in those cases where it could throw. Then if we abort access, we would at least have consistency that inout variables never get written to when the function throws. No more API/ABI breakage when switching to calculated vars.

Or...

If we need to keep the current behavior (where inout vars are written to even when there is a throw), then when we are using a temporary, we could call the inout setter on paths where the temporary has been written to before any calls involving try. We would potentially call setters multiple times in throwing functions, but there would be consistent behavior that would make sense to the user's mental model. I wrote to this variable, and then this function threw. If the setter throws, then it still makes sense, since it threw before the other throwing function could be called. Throwing happens in the same order it appears in code.

Note, when I say try, I am excluding both try? and try!.

Greg Titus suggested that here, and I responded to it here.

Hmm. This would still be a rather complicated redesign of inout parameters that, whatever its merits, is definitely not in the cards to deliver before ABI stability. The "ground rules" I tried to lay out above were designed around what I thought was reasonably feasible without radical changes to the basic semantics and implementation model.

We're interested in your response to the API/ABI breakage, too ;-)

We're not going to consider any language design which (possibly absent some strange future annotation) causes the implementation of the storage to break source or ABI compatibility. That's why I keep talking about modify, it's the basis of how we can emit accesses that are ABI-resilient to structural changes in the implementation.

I think I've been convinced by the arguments here that we can't avoid writing back to storage just because an error was thrown.

The main question still open is whether the principles of the language design we're laying out are reasonably future-proof against other expected changes to storage declarations, most importantly the ability to throw from an access — because if they aren't, and we're going to have to revise them later, this decision might end up looking short-sighted.

I would answer that question as follows. We seem to have rejected all the approaches that would allow us to define away the problem of throwing errors while other errors are already propagating. Some of those approaches have been rejected because they have semantic effects we don't like, like exposing the stored vs. computed difference in the dynamic behavior of the program. Others have been rejected because they require major and problematic changes to the access model, like tracking dynamically whether accesses have occurred and/or passing around callbacks that flush the current state of an inout back whenever a write occurs. But for whatever reason, they've all been rejected. So we're stuck with dealing with the conflicts. Any decision for how to resolve the conflict is going to seem arbitrary. I think the best decision is probably to stick with the first error that was thrown and just ignore errors thrown when ending accesses when there's already an error being propagated. I think that's an acceptable solution, so we're not painting ourselves into a corner, so we can go forward; but I also suspect that that would be a controversial solution. Note that if we can accept this solution, we can probably also accept it for defer, with the same arbitrary resolution rule.

The other open question is whether it still makes sense to include in the model the possibility of aborting an access without calling the setter. Since we wouldn't be able to use that for the lower-level semantic purpose of unwinding the access without causing possible further control flow, we could maybe instead use it for the higher-level purpose of, well, not calling the setter when nothing was ever done that might have modified the value. I have three concerns with this:

  • It would be an obscure "gotcha" for modify implementations, since it would only trigger if something provably short-circuited all modifications from the caller's perspective — e.g. if ? short-circuited or if an error was thrown by the start of a later inout argument.
  • It might be totally useless for read implementations. What could we conceivably want to skip doing at the end of a read, given that we've already materialized the value and it was just provably never used?
  • "Provably" can quickly get complicated, especially since it would be a static and non-interprocedural analysis. As soon as the function with the inout argument is called, we'd have to consider the storage to be modified. And would we want statically-known information — like whether a property is stored or implemented with a certain kind of accessor — to affect our decision about whether to finish an outer access with an end or an abort?
6 Likes

One question I have is: what code can cause errors to be thrown while errors are already propagating?

I think I have managed to think of an example, but …
The potential case I have is passing a set throws property inout into a throws function.
But, then I don't see why we should permit directly passing a throws property into a function that does not expect it?

That too would have implications where a subtle seemingly innocuous change in a function could change the analysis and make the compiler no longer call the setter. Since the setter can have important side effects (even when called with the same value), it might not be advisable to skip it on the whim of the compiler.

I'll bring up the idea of modifiers. Here's some of the possibilities:

  • @always inout: value always mutated or setter always called, regardless of throws or if there was an actual change (what we have now).

  • @success inout: value mutated or setter called only on success, never on failure.

  • @frail inout: value mutated or setter called on success if the compiler thinks the value might have changed, but this might or might not happen on failure (whatever is convenient for optimization).

Making @frail the default makes me uneasy. It's the most optimizable variant, but also the less predictable one. It's a hole in the contract of a function (something is left undefined).

But here's an idea: maybe make @frail the default, but make the compiler track in the caller accesses to values written from a @frail inout and emit a warning/error if used in the non-success path (when the value is not well-defined). You could add @always or @success in the signature of the callee to make the value well-defined for the error path and avoid the warning/error.

Edit: I realize now this is a nice solution to keep the value predictable on abnormal terminations (throws), but it does not make the side effects of the setter more deterministic in the @frail case. If we had @pure properties we could restrict @frail inout to these, but that restriction wouldn't be good for source-compatibility.

What are the downsides to calling the setter multiple times (when the value is changed multiple times)? Is it just inefficient, or are there other issues? We seem to do it now for direct storage (but that is fast).

If it is just inefficient, we could have the rule be that throwing setters are always called immediately (even if that means we lose the optimization of only calling them once when they are set multiple times). Non-throwing setters would be dealt with as you describe above (but we wouldn't have to worry about throwing during throws)...

I've been following this discussion off-and-on, and so I've only partially digested the content at best, but wouldn't your first option preserve the most options going forward in terms of not painting ourselves into a corner? That is:

  • Throwing setters are illegal. More generally, read and modify can no longer throw once they've yielded.

Would you mind going over the drawbacks of this option that have caused you to reject it as the preferred approach?

Well, in some sense, not allowing throwing accessors at all preserves the most options. My assumption is that not allowing setters to throw is ultimately not going to satisfy the people who are asking for throwing accessors.

Suppose I want to plant a garden and you want to plant a tree. If I'm worried that your tree is going to cast shade on my garden, I can't honestly just say, unilaterally, that it won't be a problem as long as you only plant a very small tree. It's more a bit of wishful thinking than a reasonable interpretation of your expressed desire.

3 Likes

I don't know what you mean here. We seem to do what multiple times for direct storage?

I can list downsides, but there really isn't a point. We're not going to change the semantics of inout so that it becomes possible to "flush" results every time a modification happens. Even if it were semantically reasonable, which I don't think it is, we just do not have time for that kind of massive overhaul.

1 Like

Swift's inout model does not require functions to understand anything about where an inout argument (or mutable self) came from. Writing to an inout parameter does not immediately modify the original storage, so the assignment doesn't immediately throw. This is quite important for the programming model of inout; otherwise it would be almost impossible to understand how all the side-effects interacted.

We could certainly try to ban this from the caller side — basically saying that you can't do anything that might throw in the middle of accessing throwing storage — but I think that's a much stronger restriction than you're giving it credit for.

1 Like

I must be misunderstanding something.

My understanding is that with:

func foo(x: inout Int) {
    x = 1
    x = 2
    throw FooError()
    x = 3
}

If the value passed to x was a variable (not a getter/setter), then it is set to 1, then 2 before FooError. But if the value passed has a getter/setter instead, then a temporary is used and written back once at the end.

If that isn't what is happening, then I missed something, and I apologize.

If the value passed to x is stored, you can't access it while foo is running, so when exactly the storage is written back to is unobservable.

I see. So it is (sometimes) being written to as an optimization, but it has no visible side effects, so it is still equivalent to the other way. Whereas calling a setter is an observable side effect that might break things.

Do I have that right?

Right. The optimizer can treat an inout parameter as unaliased memory, but that's an "as if" optimization: it always has to honor the basic semantics of the language if the difference is legally observable, so the last value assigned to the parameter has to actually be the value there when the function exits, no matter how it exits. That's why changes to the optimizer don't have to be reviewed by evolution: they preserve observable semantics.

Somebody intentionally adding bugs to the optimizer that broke basic language semantics would need to go through evolution, though. :smile:

But yes, anything that would change whether and how often a setter is called is a semantics change that has to be reviewed.

3 Likes

Okay, I think I'm leaning back in the direction of the original pitch. The main thing (re-)convincing me is that the semantic confusion here from the caller's perspective only comes up when you're passing something inout to a throwing function, which is (1) rare and (2) usually a situation where the function doesn't make strong guarantees about the exact value it's leaving in the variable anyway — q.v. MutableCollection.partition. The secondary factor is that I realized that we can pretty easily add affordances for getting back the current behavior for library maintainers worried about changing behavior; those affordances could include:

  • an attribute to request the setter to be called even during an abort (which you wouldn't be able to add to a throwing setter), and/or
  • more generally, a way to define storage as an alias for other storage, which could be both more convenient and more efficient than defining a getter/setter.

So the thrust of the argument is:

  1. It gives us a nice model which justifies not calling set/willSet/didSet in some cases where people generally don't want us to (most prominently, short-circuits from ?).
  2. It admits some possible confusion, but only in situations (code evolution of a throwing, mutating function) that don't generally make strong guarantees about how much mutation they've done anyway.
  3. It makes the overall access model cleaner and more general. It can easily support non-standard semantics that might be desirable in some situations, e.g. properties that do guarantee to call their setters during abort (like the current behavior), or "transactional" properties that restore their original values during abort despite being stored.
  4. It pre-empts major predictable semantic problems with throwing accessors.

I encourage further discussion, but I might start writing up a proposal along these lines.

Can you spell out exactly what the pitch is? Under what circumstances is "if the storage is simply stored" actually true? At one point (in a non-public conversation) you mentioned using "is declared as a stored-without-observers property" as the condition, but that seems like it either won't match up with your correction about class instance properties above (a dynamic decision), or that it won't work for cases where the stored property needs to be reabstracted with a temporary to be passed inout (a static decision).

I think there's also something weird going on with protocol witnesses behaving differently than their concrete conforming types if the abstraction changes, but I can't put my finger on it.

Your concern about abstraction is reasonable; let me try to explain why it works. IIUC, you’re concerned that the abstraction might force us into a different access pattern, and so we might use different rules for abstracted and unabstracted accesses. Actually, abstracted read-write accesses all go through modify (unless they’ve been specifically forced to go through objc, which does force a somewhat different semantic model). modify is basically exactly an outlined access to the unabstracted storage, which (as an implementor) I can best convince you by asking you to look at the synthesized AST for one: it’s exactly a yield to the direct-to-implementation storage. All of the temporary creation, reabstraction, and so on is done implicitly in SIL generation, exactly like it would be for a direct access. And modify already supports this idea of aborting the access; it just doesn’t skip calling the setter, because neither does aborting an unabstracted access.

As for the impact of reabstraction, I think you might be falling into the trap of thinking of our accessors as basically get/set but with a special case to make accesses more efficient when they’re direct to storage. Our accessors are abstractions over the patterns of code needed to perform accesses, whatever they might be. Reabstraction — and thus not being able to use the storage address — doesn’t actually affect this at all.

Sorry, but this still doesn't answer my question. Under what circumstances is the access guaranteed to be observably completed, if any? Under what circumstances is the access guaranteed to not be completed, if any?