ManagedBuffer.header and exclusivity

Karl · February 1, 2022, 5:23am

I've noticed a bit of a performance footgun with ManagedBuffer.

The header is written as a class property in the standard library's source code. This means it requires dynamic enforcement of the law of exclusivity, which can have significant performance consequences.

For example, typically users of ManagedBuffer will store the buffer's count and capacity within the header, and will want to ensure that accesses to the elements are bounds-checked. This means every access to an element must also access the header, resulting in runtime calls for exclusivity enforcement. In some loops, I've seen that enforcement account for up to 75% of the code's runtime.

But this enforcement is generally not necessary: ManagedBuffer is typically used to build
copy-on-write data structures with value semantics, so exclusivity is enforced at a higher level than the
compiler is able to reason about. Indeed, it is not even possible to access a MB's elements except through an unsafe pointer (which has no exclusivity enforcement), so developers already need to manually ensure that overlapping writes do not occur.

So I'm wondering what we should do about this. I have 2 ideas:

I notice that @Erik_Eckstein recently added an attribute to disable exclusivity enforcement on a per-property basis. We could perhaps annotate ManagedBuffer.header with this attribute. I'm not sure if that would technically count as a behaviour change -- it is a change in enforcement of a rule that developers should have been obeying anyway, and have to obey in order to safely use the ManagedBuffer's elements.
Alternatively, we should document that anybody implementing a COW type using ManagedBuffer should prefer to use withUnsafeMutablePointerToHeader rather than the header property, in the same way we advise them to store the capacity in the header rather than using the capacity property.

Thoughts?

Karl · February 1, 2022, 5:39am

Simple example: Godbolt

Notice that testViaProperty includes a call to swift_beginAccess, but testViaPointer does not.

Karl · February 2, 2022, 5:06pm

After some investigation, it appears that Deque in swift-collections also prefers to access the header via a pointer.

I'm not sure if that was a deliberate choice to avoid exclusivity checks, or a stylistic choice which just happens to be more efficient. @lorentey ?

lorentey · February 2, 2022, 10:54pm

This is just a stylistic choice. (It's sort of a consequence of the design choice to implement operations on Deque._UnsafeHandle rather than directly on the buffer class. Which is in turn mostly a consequence of my general distaste for the ManagedBuffer API, plus a vague step towards having a non-owning, unsafe, fixed-size deque implementation.)

I'd expect ManagedBuffer.withUnsafeMutablePointerToHeader to perform an exclusive access on the header property for the duration of the closure call.

My problem (as much as I have one) with exclusivity checking isn't really its performance impact -- it's just that the law of exclusivity is one reason Swift cannot currently model constructs (such as synchronization primitives) that specifically need to allow concurrent/overlapping mutations. But this seems a minor issue compared to the benefits of reducing data races.

This is unrelated to simple collection types like Deque. Performance can be important, of course, and I'm happy that library authors get some control over exactly what is checked at runtime -- @exclusivity(unchecked) seems like a really nice win. (I don't know if it would be a good idea to apply it to the header property, but your argument seems very reasonable!)

Karl · February 3, 2022, 10:57am

It does not. I've confirmed with Godbolt (above), benchmarks, and @_assemblyVision. And IMO, that's good; it's important to provide a way to opt-out of it.

My understanding is that exclusivity checking isn't really about reducing data races; it's important even for entirely synchronous code, and is closely related to alias analysis. If you have a function parameter of type inout T, the compiler needs to know that variable is not aliased anywhere else in the program, which allows it to avoid making temporaries.

If you have a COW data type, mutating operations must first create a unique copy of the buffer (header and elements), which eliminates most possibilities of aliasing. But you can still get in to trouble if, after making that unique copy, your implementation aliases the new buffer locally:

// - Ensures this function has the only reference to 'bufferCopy'.
let bufferCopy = buffer.ensureUnique()

// - This is a non-instantaneous mutating access to the new buffer's header.
bufferCopy.header.someMutatingFunction {

  // - But we can perform an overlapping access if we reach-around and
  //   access the data via 'bufferCopy' again!
  if bufferCopy.header.data == 0 { ... }
}

// But this should be fine. ManagedBuffer is a class so these properties
// live in different 'exclusivity domains' (term I made up).
bufferCopy.mutateElements {

  if bufferCopy.header.data == 0 { ... }
}

I don't think I've ever seen a ManagedBuffer header with non-instantaneous mutating operations, but it can technically happen.

The current .header property catches this at runtime; but going through a pointer does not (which is a strong argument for not relaxing the current behaviour, although perhaps it could be a debug vs. release mode thing).

There isn't really a satisfactory way to get static checking of this, without special language support. Ideally, we'd be able to tag header-accessing/mutating operations as being in exclusivity zone "A", and element-accessing/mutating operations as being in zone "B", and operations on both as being in both zones. Then we'd be able to make a blanket statement that all COW types are safe from an exclusivity standpoint, and won't need dynamic enforcement.

Although perhaps something like that would be better as a true, native COW type (think something like an actor - a new kind of entity). It could guarantee you never forget to copy your data to a unique reference (reducing some of the boilerplate), ensure exclusivity, and generally just be a better ManagedBuffer. They would also be implicitly Sendable.

@Andrew_Trick, do you have any thoughts on the idea of having exclusivity "zones" or an entity for COW types?

Andrew_Trick · February 3, 2022, 8:52pm

My problem is the performance impact
The assumption was always that move-only types would give us a shared-mutable type for synchronization. So passing an inout reference would effectively be a read of the property.

I don't think exclusivity protects us against races, but it does protect us from the unpredictable effects of undefined behavior. And makes it a lot easier to implement mutable data structures without worrying about the effects of code running during the mutation.

I would like to have a @cowStorage annotation on the field that references the CoW storage class

So the compiler can infer a mutable-if-unique constraint, for sendability and various optimizations.
So the compiler could give CoW values more aggressive lifetime semantics based on a promise that the container not expose weak references or unsafe pointers to the storage.

We could also say that a cowStorage object promises not to violate exclusivity on any of its properties. But it's not hard to just add @exclusivity(unchecked) to those properties.

Honestly, it's probably not worth trying to protect some parts of the header struct. We could just add @exclusivity(unchecked) and document that extensions promise not to violate exclusivity. Long term, there may be a better way to let people write their own CoW types than extending ManagedBuffer.

lorentey · February 3, 2022, 11:31pm

Beware, verbatim bridging currently violates this promise for the standard bridged collection types -- casting to AnyObject (or NSObject/NSArray/NSDictionary/etc) exposes the raw storage reference in that case. This can be used to violate value semantics in a limited sense -- a weak/unsafe reference to the storage object may suffer action-at-a-distance effects.

lorentey · February 3, 2022, 11:39pm

Sure; I meant semantically it should count as an exclusive access. But thinking on this a little bit more, even though that may be true, it's probably no longer easily enforceable at this point. If withUnsafeMutablePointerToHeader implemented an exclusive access check, then this API would no longer be safe to use in read-only contexts. I know I've written many collection types that indiscriminately call it -- so adding the check would be source (and perhaps ABI) breaking at this point.

I'm counting this as yet another thing to dislike about ManageBuffer.

Andrew_Trick · February 4, 2022, 12:29am

Thanks, good point. The compiler would not rely on this assumption for correctness. It would just lead to a 'nil' weak ref a bit earlier in a situation where the user model is already basically broken.