Clarifying the semantics of abnormally terminating a storage access

Maybe yield here can return a boolean success/failure value; then the discarded result warning would prompt you to specify abort behavior, and if it turned out you need to do the same thing in both cases, it's perfectly valid to say _ = yield &value.

Okay, I think that's definitely just something we should discuss when I pitch generalized accessors, since it's no longer changing anything about the interaction with the code performing the access.

1 Like

I will discuss this sentence below.

For example, exclusivity is enforced on accesses to memory, not storage. The exact details of how you define storage can change what accesses to memory actually occur and when.

Yes. Yet, users are surprised when a software component changes its behavior. If I understand well, turning a stored property into a computed property becomes a breaking change (unless we decide to call it undefined behavior that no one should rely on - something pretty rare in Swift).

Let's look again at the innocuous piece of code from @gregtitus, transformed just a little bit:

import Library

func foo(_ i: inout Int) throws {
    i = 1
    try ...
}

var s = Library.SomeType(x: 0)
do {
    try foo(&s.x)
} catch {
    print(s.x) // 1 or 0 depending on the version of Library
}

And let's imagine that Library author replaces Library.SomeType.x from a stored property to a computed property.

The Client of the library upgrades Library. The app changes: s.x used to be 0 in the catch block. Now it is 1. App crashes, data is lost. Semantic versioning contract is broken. The Client reports the issue to the author of Library.

My question is: is it possible for the library author to fix it? Which means have the property computed (because Library needs to evolve), while preserving compatibility (because Clients don't want their app to break)?

  • If library author can fix it, the responsibility of learning about your pitch is on the shoulders of the library developer. This pitch should come with the technique that the library author should use when a property turns from stored to computed. That's OK. Not nice, but OK.
  • If library author can't fix it (because reasons), then we have a big problem that must be considered (if not addressed right now).

If progressive disclosure and semantic versioning are still things in Swift, then this pitch may well contain a bomb. Even if I can feel, here and in other threads, how much energy is put in fascinating and very high level surgery of Swift internals, I can sometimes see when things might go awry. Here is one.

If you write a throwing function which is mutating or accepts an inout parameter, I think it is reasonable to tell the implementor to be careful about the state they leave that value in (as in the worst case, it may be directly written to memory). That would suggest that we should always call the setter.

Yes, sometimes it is reasonable to ask to be careful. But my question was more about control. If undefined behavior can't be avoided, if control is impossible, if turning a stored property into a computed property becomes a breaking change (not only ABI, but also API-wise), if being careful implies avoiding throwing mutations, if throwing subscripts and properties are excluded by linters because of their dangers, etc, then something has been missed.

Hence my attempt at showing the readers of this thread the potential for big trouble that is lurking in this pitch. Sorry if I disturb you, I won't insist more. Please take care not only of the runtime, but also of evolving programs. We all write programs that evolve. It's our most fundamental task. ABI stability is hard. But API stability is supposed to be easy. It won't be if library coders can no more turn stored properties into computed ones without introducing unavoidable changes in their clients' code.

3 Likes

I think that the question comes down to the capability for a coder to intentionally modify an inout param and then throw.

With the proposed aborting behavior, that becomes an anti-pattern. Sometimes the change happens, sometimes it doesn't. If we went forward with aborting, the compiler should at least warn and possibly error if the CFG has any path where an inout write happens before a throw. Perhaps we can extend @discardableResult to mark inout params where you are aware that this can happen, in order to silence the warning/error.

Given such a warning/error so that the compiler tells people what's up, I'd be okay with aborting.

2 Likes

Thanks Greg for both helping @John_McCall and folks move forward, and alleviating a trouble I could show, but not fix!!

1 Like

I tried to cover that in my long response; modify is expressive enough to describe any of these behaviors, so we can just talk about this as if we were just implementing modify. In fact, that's quite important, because there are a number of reasons that we do sometimes have to implement these accesses by actually calling a synthesized modify accessor; solutions that can't be expressed by modify are problematic because their semantics will change when modify is used. So let's run through possibilities for what we can do.

I want to lay down some ground rules here. I think it's acceptable to expose differences like aborting vs. ending to the implementation of modify, but I don't think it's okay to expose why an abort happened or allow modify to capture errors from its caller or anything like that. In other words, we have two basic tools at our disposal, both taken from the perspective of the caller of modify:

  • Given that something has happened in the caller that needs to terminate the access, we have to decide whether it's an abnormal "abort" or a normal "end". This needs to be a static decision and can't really be decided by the programmer; it needs be a basic semantic rule of the language that's always done the same way.
  • Given that we've terminated the access in some way, we have to decide whether it's possible to get an abnormal result from the termination, and if so we have to decide what to do with it, for any particular reason why we're terminating the access. Again, this needs to be a static semantic rule of the language.

It sounds like people generally feel that modify for a computed property should always write back if any modifications were actually made to the temporary. I see two ways of formalizing this with the tools at our disposal:

  • The first is that there's just no difference between aborts and ends for computed properties. This is nice and elegant, and reserves some room for properties to distinguish them if they need to, which among other things might be important for throwing setters.
  • The second is that aborts don't call the setter, but an abort is only triggered when the caller hasn't done anything to modify the storage yet. This might seem like an appealing rule, but I don't think it really works very well in our model. If you call a mutating method on something, we have to assume in the caller that it's actually done some modifications. We're not going to dynamically track that. So we'd only be aborting accesses if we short-circuit before the mutation, or maybe if we throw before it. And what counts as "before the mutation"? In something like computedProperty.optionalProperty?.mutatingMethod(), can we abort the access to computedProperty if the ? short-circuits, even though projecting out optionalProperty might have caused mutations, depending on how it's defined?

In both cases, we're left with the problem of what to do with throwing setters, because we're rejecting the ruleset that would have let us define the problems away (namely, that the abnormal termination cases always cause setters to be skipped, so they're only ever run in a more typical evaluation context where throwing an error doesn't cause a semantic conflict). So we have several options:

  • Throwing setters are illegal. More generally, read and modify can no longer throw once they've yielded.

  • Throwing setters aren't called when aborting an access. The abort path of a modify just wouldn't call throwing setters. I think this is unacceptable; marking a setter as throwing would radically change its semantics even if it never actually throws.

  • Errors thrown by setters are ignored when aborting an access. The existing reason for terminating the access always takes precedence.

  • Errors thrown by setters when aborting an access cause process termination. This is what C++ solution with exceptions; I don't think it's actually a good idea.

  • Errors thrown by setters take precedence over the existing reason for terminating the access. This is what Java does if you e.g. throw during a finally block.

  • Errors thrown by setters cause a new error to be thrown which combines the two existing errors. This is what Ada does. Note that this is not consistent with typed throws, or at least greatly complicates it.

The first four rules allow a general principle that the abort path out of a read or modify never throws an error. Aborting an access just causes yield to "return", exiting scopes and running any active defer blocks, which are of course already forbidden from throwing. Ignoring errors during abort (or terminating the process, if we're taking that seriously) can be done completely internally to the accessor implementation. But note that the second and third rules aren't good enough if any of the abnormal termination paths causes active accesses to be ended normally instead of aborted, because the normal end can still throw, and so we still need a rule for deciding what to do. (That rule could still be "ignore the thrown error".)

The last two rules don't allow that general principle, and so understanding the dynamic behavior of the program gets pretty complicated when errors are being thrown left and right.

I think this is being complicated by conflating the three distinct problems of "throwing while an accessor is in progress", "throwing from an accessor" and "aborting an access".


My opinions on the cases:

"throwing while an accessor is in progress", e.g:

func foo(_ i: inout Int) throws {
    i = 1
    try // …
}

should be handled as a normal end of the access, as it is now.


"throwing from an accessor", e.g:

var foo: T {
  get throws {
    // either returns a T, or throws an error.
  }
}

// in throwing context:
let bar: T = try foo
print(bar)

Returning would be like a normal access, Throwing would propagate the error, with the access officially not started. Additionally, accessing this property would require try, etc.


"aborting an access":

struct T { var optionalProperty: U? - nil }
var computedProperty: T
computedProperty.optionalProperty?.mutatingMethod()

In this case, if optionalProperty was nil, then the accesses on optionalProperty and then computedProperty would be aborted.
I see three possible definitions (though there could be more) of this 'aborting', it could be identical to a normal 'end access', as it is now, or it could do nothing, or it could be property-implementation defined behavior.

1 Like

The interesting case is throwing during a setter, not a getter. In that case, the access has definitely started.

I was less thinking about each of the three in particular, and more showing that I see them as different things. However, I should have mentioned set throws.

As I see set throws

var foo: T {
  get { /**/ }
  set throws {
    // either returns Void, or throws an error.
  }
}

// in throwing context:
let bar: T = // ...
try foo = bar

The get could throw or not, if get threw, control wouldn't get to the set, otherwise:
Returning from set would be like a normal access.
Throwing from set would propagate the error, and I see two possible definitions for end conditions of the property itself (though there could be more):

  • It could be implicitly, officially ended, as if the calling code had ended the access (either as an 'end access' or as an 'abort' as property-implementation defined behavior), or
  • It could remain 'open', requiring the calling code to 'end' or 'abort' the access itself.

Additionally, setting this property would require try , etc.

I'm not sure you're actually grappling with any of the open questions here; you're just telling us how to deal with the easy cases that we weren't bothering to discuss because they seemed obvious.

I think everyone will hate this idea, but I will include it for completeness.

We could ask/require throwing setters to handle the error. In addition to the implicit newValue parameter, we also pass an error parameter which contains an optional Error. The setter can then do what it thinks is appropriate with the error, which would mean either ignoring it, rethrowing it when it deems appropriate, or throwing an error which combines them in some way.

The compiler could warn when it sees a throwing setter block which hasn't referenced the error parameter in some way...

The disadvantage of this is that you would get a bunch of different behavior from different APIs around this based on what the programmer decided to do with the error...

Perhaps, but I did not see that the easy cases were being handled.

That's a little bit difficult given that modify and coroutines are not an official part of the language yet. I know they're implemented and used in the standard library, but most people here have no experience using them. It's easier to talk about getters/setters.

If you have a modify accessor, can you omit writing a setter?

Yes, but it doesn’t exclude writing a setter.

1 Like

Would it be possible to allow throwing getters and modify, but not setters? Would that fix the issue?

If so, and someone asks for a throwing setter, we can just point to a throwing modify as the answer...

If a throwing modify can throw after it yields, it is exactly the same problem as a throwing set.

1 Like

I think the part I am finding confusing here is, in the following example, why isn't the setter called before the throw?

If it were, wouldn't that solve the issue?

inout parameters are not, and have never been, direct abstractions over the original storage where loads and stores to the parameter immediately trigger the corresponding operations on the original storage. That would not be a good model for a lot of reasons: the value of the parameter could potentially change arbitrarily throughout the function, getters and setters could be triggered repeatedly, and so on. Instead, the parameter is bound to a temporary (if the storage isn't simply stored) which is filled by a single read and, eventually, written back with a single write. This is a slight loss if the function never reads or writes to its parameter and a huge win in basically every other case.

8 Likes