SE-0223: Accessing an Array's Uninitialized Buffer

jrose · August 21, 2018, 9:51pm

No, the initializer has just as much of a problem. You need to deinitialize the elements of the array that have already been initialized regardless of whether there was anything in the array at the start of the callback.

Karl · August 24, 2018, 1:37pm

On the other hand, we do have a hack which higher-level operations could use to support throwing operations even if this initialiser doesn’t support throwing. It’s a general problem not specific to this API.

Speaking of which, it would be nice if Collection had non-throwing overloads of operations like “map”. There are situations where you can write simpler, faster implementations when you know the transform will always succeed; but even if you write them, they won’t be available in a generic context.

Joe_Groff · August 24, 2018, 4:35pm

The core team has been discussing this proposal, and particularly the issue of the behavior when the closure throws that @beccadax raised. We acknowledge that supporting rethrows closures is important for many algorithms to be able to take advantage of this feature in the language today. The proposed rule, that the initializedCount must accurately represent the state of the buffer on exit regardless of whether that exit was because of a normal return or error, is consistent and easy to explain. For algorithms where it's feasible to keep the initializedCount updated as progress is made, this just works. Unlike other languages like C++, Swift also explicitly marks propagation points with try, giving evidence in source code where early exits must be taken into account. It is however a hazard for algorithms that don't contiguously initialize the buffer and neglect to take error propagation into account.

The core team wanted to offer an alternative rule for discussion: if the closure throws, then the entire buffer is considered to be uninitialized. This means that code that doesn't handle errors correctly may leak, but will at least never encounter undefined behavior. This doesn't eliminate the issue, but may mitigate the potential fallout of incorrect code. There are also a good selection of leak detection and memory analysis tools available for finding these sorts of bugs. On the other hand, silent leaks are easy to slip by when developers don't think to check for them. This also presupposes that there aren't clients of withUnsafeMutableBufferPointerToStorage that would want to throw and also update the subject array to a specific valid state.

I'm going to extend the review period to August 28 to allow for some more discussion of this idea, and to allow the ongoing discussion about throwing behavior to continue. Thanks everyone!

dabrahams · August 24, 2018, 5:41pm

In particular, the amendment under discussion means initialization closures that throw when constructing elements of arbitrary type should "manually" deinitialize any elements they have initialized to avoid leaking. This manual deinitialization:

can be omitted without opening a type/memory safety hole
would incur a small additional burden for the author of the initialization closure
could be omitted without leaking for types known to have trivial deinitialization (e.g. Int)
would save a small amount of overhead inside the standard library's implementation of the unspecialized Array.init that could be specialized away in user code for many element types

I'm not arguing strongly for or against either choice, FWIW. This is merely clarification.

gregtitus · August 25, 2018, 5:51pm

+1 for entire buffer uninitialized. This is a much easier behavior to understand, and incidentally more likely to be what callers actually want to do when an error is thrown, in addition to the other advantages already mentioned.

Ben_Cohen · August 26, 2018, 3:31am

My feeling is, in order to decide between the different options, we need some sample code showing how you'd write either option without leaking – some short examples could show how simple or fiddly writing leak-free code could be. I think this probably belongs as part of the proposal.

TellowKrinkle · August 26, 2018, 7:04pm

I think I would go for something like this to cleanup the proposal's example (removing the initializedCount set currently at the end) stablyPartitioned(by:):

defer {
	if high > low {
		buffer.baseAddress!.deinitialize(count: low - buffer.baseAddress!)
		high.deinitialize(count: buffer.baseAddress! + count - high)
		initializedCount = 0
	} else {
		initializedCount = count
	}
}

I would be more worried about cleaning up using this if the second option were selected, since then I would have to make sure that a throw couldn't happen when the buffer was fully initialized, since then my attempt to set the initializedCount properly would be ignored. With the first option, I only have to make sure that the defer puts the buffer in a valid state, and I can know that it'll get properly cleaned up.

nnnnnnnn · August 27, 2018, 4:38pm

It's definitely not you! That example leaks memory right now and needs a do/catch block that cleans up in order to comply with either the semantics described in the proposal or the alternative described by Joe. I've updated the example in this PR: Add correct throw-handling to stablePartition example by natecook1000 · Pull Request #899 · apple/swift-evolution · GitHub

When considering how you write a closure for initialization, this alternative is simpler than what's proposed, but only by a bit. In either case, you have to account for the number and position of elements that you've initialized and have the benefit of knowing that number started at zero. You also don't have any expectation about the resulting array, since a throw means that no array is returned from the initializer. From the perspective of the caller, both what's in the proposal and the alternative have the same behavior.

The semantics of the mutating method are trickier, since the array likely already has some initialized elements. I was curious about the behavior of Array's other mutating + rethrows methods—we only have a few:

sort(by:) — If the comparison closure throws, the order of the array is unpredictable, but we guarantee in the documentation that no elements are lost.
removeAll(where:) — If the predicate throws, the order is unpredictable, but no elements are lost, including any that should have been removed. This isn't currently documented. (For non-MutableCollection collections, the callee is unchanged entirely if the predicate throws.)
withUnsafeMutableBufferPointer(_:) — If the closure throws, the count of the array is unchanged. The closure has UnsafeMutablePointer access to the array's elements, so if elements have been deinitialized and not yet reinitialized there's room for undefined behavior of the type we're trying to avoid here.

Looking at those, my first thought is that mutating operations that throw are really weird. Throwing is supposed to represent an alternate method of returning from a method—you get back either a value or an error—but when used in a mutating method you can observe the side effects of whatever portion of the algorithm ran before the throw.

Second, I think it would surprise me greatly to have the array I'm modifying disappear out from under me. Certainly throwing is most often used when an error occurs, but it can also be used for flow control (to break out of a forEach closure, for example). Since the alternative throwing semantics don't match up with the other mutating + rethrows methods in the stdlib, it would make it harder to use the new method as a building block for algorithms like those above or other, new ones.

The most compelling point in favor of the alternative approach is that it's impossible to end up with an array that lets you access uninitialized memory (a higher standard than we hold withUnsafeMutableBufferPointer(_:) to). I don't think it simplifies the code you have to write to handle throwing correctly. Comparing the postconditions:

Proposed: On exiting the closure, the elements in buffer[..<initializedCount] must be initialized and in buffer[initializedCount...] must be uninitialized.
Alternative: On exiting the closure, the elements in buffer[..<initializedCount] must be initialized and in buffer[initializedCount...] must be uninitialized. If the closure throws, initializedCount must be zero.

That's not the way I would document the postconditions, but it gives a hint as to the similarity of implementations they require. Either way, you need to be able to correctly handle any funny stuff you're doing within the closure; the added step of deinitializing buffer[..<initializedCount] required by the alternative is trivial.

beccadax · August 27, 2018, 5:36pm

How does that rule apply to withUnsafeMutableBufferPointerToStorage(capacity:_:)? Does it mean that this test passes?

var someArray = [1, 2]
someArray.withUnsafeMutableBufferPointerToStorage(capacity: 10) { buffer, count in
  throw MyError.someError
}
XCTAssert(someArray.isEmpty)

I don't necessarily think that's a bad idea—it's certainly the safest default we can choose, and you can always catch the error inside the closure and rethrow it outside if you want to do something else. But it will be very surprising. At least it'll be consistently and safely surprising, though.

An alternative would be to have only the new capacity be considered uninitialized—that is, the changes to count are ignored. That's more dangerous, but probably less surprising, and again you can always catch and rethrow if it's not what you want.

Another alternative would be to provide a different method entirely—call it append(unsafeUninitializedCapacity:initializingWith:). The block for this method would only have access to the new, uninitialized capacity, so that's the only part that would need to be assumed uninitialized after throwing. If you want to access the existing capacity, you'd need to use withUnsafeMutableBufferPointer(_:) instead. If we think it's critical to be able to access the existing capacity, we could perhaps provide that through a second buffer pointer parameter to the closure—we could even decide to make this one immutable for safety.

(I do notice, though, that all of these behaviors would be less surprising if the block returned the final count instead of setting an inout parameter. If you haven't returned a count, of course it can't know how many elements you've initialized.)

TellowKrinkle · August 27, 2018, 10:23pm

Would it really be that safe though?

var someArray = [MyClass(), MyClass()]
someArray.withUnsafeMutableBufferPointerToStorage(capacity: 10) { buffer, count in
  throw MyError.someError
}
// MyClass instances were leaked

beccadax · August 28, 2018, 1:08am

Leaking instances is type-safe (it doesn't interpret a value of one type as an incompatible type) and memory-safe (it doesn't access or modify memory that doesn't belong to it). It's still a bug, of course, but it's not the kind of bug Swift is concerned with when it talks about "safety".

Karl · August 28, 2018, 3:01am

Is this not true in either case?

I also don't see how the amendment would improve type/memory safety compared to the originally-proposed semantics. AFAICT, it just forces me to do more bookkeeping and special-casing of error conditions.

If it's a tie, I would rather go no-throwing. We could always add a rethrowing version later.

Nevin · August 28, 2018, 4:32am

Would it be reasonable to say that “unsafe” APIs in general are not rethrowing?

I am not an expert here, but it seems as though the unsafe territory exists, in some sense, at a “lower level” than throws and rethrows.

dabrahams · August 29, 2018, 12:27am

I think that whether it's simpler will depend on the initialization pattern used by the closure: the proposal optimizes for simplicity when the initialization pattern is linear from start to end (e.g. not for something like an eager reversed()), provided that you are updating initializedCount as you go. The alternative will be very slightly simpler/less-error-prone in the other cases, since you'll have to deinitialize the initialized elements anyway, and won't have to remember to set initializedCount to zero before returning.

This doesn't seem like a very compelling difference either way. Other factors that are potentially more interesting:

How simple is the resulting API documentation (which goes to the API's understandability)?
How much code will be generated in client applications
- When the Array.init is fully specialized
- When the Array.init is unspecialized
What are the performance implications
- For the straight-line (non-error) case
- For the case where cleanup is required

FWIW, that guarantee is not really important in practice; we did it because we could, but C++ gives no such guarantee and nobody complains. The only two really interesting points in the design space for when an error can occur are the basic guarantee (“all invariants are preserved,” which is sort of tautological, and there are no leaks, which can be viewed as a kind of invariant) and the strong guarantee (“in case of an error, there are no effects”). Since invariants must always be preserved, that means you document the strong guarantee when you can provide it without compromising performance.

I was always concerned about the precedent set by documenting that no elements are lost, because it has not been shown to be useful in practice and it makes considering loss of elements a new dimension for people to consider in establishing and documenting error handling behavior, which most people have a hard enough time doing.

…and the chickens find a roost

That parenthesized part is a red herring because that overload shouldn't exist.

Weird or not, mutating methods sometimes need to fail, and there's a principled way of thinking about and documenting them: invariants always hold (of course), you treat the function's postcondition as only applying in the non-error case, and you look for opportunities to offer the strong guarantee.

Not sure what you mean here. A thing that throws from its initializer never fully existed anyway, and will disappear forthwith. That's a feature, not a bug.

@gribozavr always told me adding forEach was a mistake . <opinion> I understand that you can (ab)use errors that way, but it doesn't mean that you should. If you need an algorithm that stops partway through a collection, it should be written with a termination predicate like firstIndex(where:).</opinion> The design of errors makes them good for propagating termination through many layers of the call stack to an unknown place, but that use case doesn't fit the pattern, and using errors that way will make code harder to understand.

Are you sure it does that? Can't you do that by failing to initialize part of the array and not throwing an error?

I don't think those are postconditions on the array initializer. At least, they don't read like guarantees the array initializer gives. They seem like requirements on the closure parameter. But I don't think the latter one is accurate either. In the alternative, there's no requirement on initializedCount when the closure throws. It is the closure's responsibility to deinitialize any elements it initialized in case of an error, but that's a part of the standard not-leaking contract that is part of the basic guarantee: if you allocate a resource you must either deallocate it or pass ownership to someone who will. Exiting normally passes ownership to the Array.

jrose · August 29, 2018, 12:35am

dabrahams:

FWIW, that guarantee is not really important in practice; we did it because we could, but C++ gives no such guarantee and nobody complains. The only two really interesting points in the design space for when an error can occur are the basic guarantee (“all invariants are preserved,” which is sort of tautological, and there are no leaks, which can be viewed as a kind of invariant) and the strong guarantee (“in case of an error, there are no effects”). Since invariants must always be preserved, that means you document the strong guarantee when you can provide it without compromising performance.

I was always concerned about the precedent set by documenting that no elements are lost, because it has not been shown to be useful in practice and it makes considering loss of elements a new dimension for people to consider in establishing and documenting error handling behavior, which most people have a hard enough time doing.

I remain unconvinced that this is not useful in practice. If I started with an unsorted array and am trying to sort it, I may still have a use for the unsorted array afterwards, even if it's a differently-ordered unsorted array. Saving a copy just in case would have a performance penalty.

(Dave and I have talked about this before in person and mostly just reached a standstill, so I'm mostly just providing this as documentation.)

dabrahams · August 29, 2018, 12:40am

beccadax:

How does that rule apply to withUnsafeMutableBufferPointerToStorage(capacity:_:) ? Does it mean that this test passes?
var someArray = [1, 2]
someArray.withUnsafeMutableBufferPointerToStorage(capacity: 10) { buffer, count in
  throw MyError.someError
}
XCTAssert(someArray.isEmpty)

Huh. I didn't notice that method, and it occurs to me that it probably shouldn't have a capacity argument because you can always reserveCapacity as an initial step. That said, I think you need a try and you don't reach the assertion because the method rethrows. But even if you did reach the assertion, I don't see why that would reliably produce an empty array. That's not a particularly useful result and producing it makes extra work for the library.

beccadax:

I don't necessarily think that's a bad idea—it's certainly the safest default we can choose, and you can always catch the error inside the closure and rethrow it outside if you want to do something else. But it will be very surprising. At least it'll be consistently and safely surprising, though.

An alternative would be to have only the new capacity be considered uninitialized—that is, the changes to count are ignored. That's more dangerous, but probably less surprising, and again you can always catch and rethrow if it's not what you want.

Another alternative would be to provide a different method entirely—call it append(unsafeUninitializedCapacity:initializingWith:) . The block for this method would only have access to the new, uninitialized capacity, so that's the only part that would need to be assumed uninitialized after throwing. If you want to access the existing capacity, you'd need to use withUnsafeMutableBufferPointer(_:) instead. If we think it's critical to be able to access the existing capacity, we could perhaps provide that through a second buffer pointer parameter to the closure—we could even decide to make this one immutable for safety.

(I do notice, though, that all of these behaviors would be less surprising if the block returned the final count instead of setting an inout parameter. If you haven't returned a count, of course it can't know how many elements you've initialized.)

Avoiding surprises with error handling requires establishing a consensus about how error handling behaviors are documented and a framework for thinking about the consequences. So far we have not done that for Swift.

dabrahams · August 29, 2018, 12:51am

It seems to me that the burden of proof should be on those advocating the supposedly-extra guarantee. One real example of an application taking advantage of it would be enough to prove that it's useful, whereas it's not possible to prove the opposite.

Of course, my real concern is that the number of real applications that can benefit from "no loss" is so small that it doesn't justify the complexity added to what you need to consider when implementing and documenting your error handling behavior. You can dive into an arbitrary level of detail in documenting the effects of a mutating operation when it fails. It is not obvious to me that "no loss" is significantly more useful than the basic guarantee, and it's not obvious to me that it would be enough detail for every application that needs something stronger than the basic guarantee and less than the strong guarantee.

Powerfully understanding error handling requires having a simple set of distinctions with which to work. Adding a not-sufficiently-useful shade of reasoning is counterproductive to that goal, which is part of the reason Sutter's application of ACID to exception handling didn't catch on.

nnnnnnnn · August 29, 2018, 2:47am

Only the first paragraph of my post was about the initializer — everything after (including the quoted bit) is in reference to the throwing semantics of the withUnsafeMutableBufferPointerToStorage(capacity:) method. I'll have a bit more soon, but wanted to clarify that most of my concern is around what happens if someone throws an error from the closure in that method, not the initializer.

beccadax · August 29, 2018, 3:23am

Oops, you're right—that sample was off the cuff. I guess I mean something more like:

var someArray = [1, 2]
_ = try? someArray.withUnsafeMutableBufferPointerToStorage(capacity: 10) { buffer, count in
  throw MyError.someError
}
XCTAssert(someArray.isEmpty)

The rule under consideration is "if the closure throws, then the entire buffer is considered to be uninitialized." Applied literally to this method, I think that would mean the existing elements would be cleared out. But if we want to only assume the new elements are uninitialized, I think this design might be more clear:

public init(
    unsafeUninitializedCapacity: Int,
    initializingWith initializer: (
        _ buffer: UnsafeMutableBufferPointer<Element>
    ) throws -> Int
) rethrows

public mutating func append(
    unsafeUninitializedCapacity: Int,
    initializingWith initializer: (
        // Includes only the uninitialized capacity
        _ buffer: UnsafeMutableBufferPointer<Element>
    ) throws -> Int
) rethrows

Then applying the proposed rule to append(unsafeUninitializedCapacity:initializingWith:) would only mean the new elements would be cleared out, and the fact that you had not returned a count would suggest that none of the new elements would be considered to be initialized. I also like that the method and initializer are more symmetrical; you could probably even pass the same initializer function to both. It makes them feel like they're part of a single family.

dabrahams · August 29, 2018, 6:41am

Sorry, that was only supposed to apply to the initializer, since I didn’t know about the other one