ManagedBuffer
has a public stored property called header
. I'm complaining about this particular stored property.
At the time the freshly allocated object is passed to the factory closure, this stored property isn't initialized. The closure is expected to not to access it. From the docs:
During instance creation, in particular during
ManagedBuffer.create
’s call to initialize,ManagedBuffer
’sheader
property is as-yet uninitialized, and therefore reading theheader
property duringManagedBuffer.create
is undefined.
Vending invalid objects to the closure is quite bad enough already, but it gets worse: the closure is allowed to throw. If it does, then the header
ivar stays uninitialized, but the custom deinit
will still run. In most cases the custom deinit needs to access the header
ivar to figure out what parts of the buffer need to be deinitialized. When it does so, things tend to blow up.
struct Foo: Error {}
class Bar {
func greet() {
print("Hello from Bar.greet")
}
}
class Baz: ManagedBuffer<Bar, Int> {
deinit {
print("Hello from Baz.deinit")
header.greet()
}
}
_ = try? Baz.create(minimumCapacity: 100) { _ in throw Foo() }
// ⟹ Hello from Baz.deinit
// ⟹ 💥 Segmentation fault: 11
The only way to fix this I can see is to stop allowing the closure to throw, or to get rid of the closure altogether.
(The fact that custom subclasses must not define additional ivars is inconvenient, but I'm not complaining about that -- and sadly I don't expect we'd find a way to lift that restriction any time soon. I do think we need to explicitly verify this constraint at runtime, though, at least in debug builds.)
(I'm also not complaining about all the unsafe memory operations within any properly written use of ManagedBuffer
. Those are a feature, not a bug.)
If we could find a way to make create
an actual initializer, then that would be awesome! It would also resolve the naming problem that we have for its replacement. I'm hoping we would be able to do it without introducing a throwaway protocol for it, though.
My intention here isn't to ridicule the ManagedBuffer
design -- I'm sorry if it comes though that way. I'm trying to point out practical problems with it in order to get a picture how people feel about maybe fixing them. ManagedBuffer
is objectively not a joy to work with; I don't think we can fix that entirely without language changes, but at least we can fix some of the lower hanging API issues.
If we were to design this from scratch, then with the benefit of hindsight I'd probably consider fixing problems #4 and #5 in the new design. I don't think we're at that stage yet, but I do think we need to seriously consider getting rid of the capacity
property. (I.e., deprecating it everywhere based on a language version, and not carrying it forward to new platforms.)
This is good information, but it's not something I can practically act on. The buffer cannot just be a dumb storage class with no knowledge about its concrete element type -- it must be subclassed to customize deinit
. Most other operations can be moved out of the class (and in fact that's often how things naturally fall out during the implementation); but the class instance must know how to clean up after itself.
If we had language features such as a type system-level way of modeling methods that must only be called on a unique instance, then we'd probably want to forward to a class method more often, just to reap the benefits of static uniqueness checking. I expect this would put far more pressure on this bug.
Yes, yes, a thousand times yes! However, I suspect we cannot model heterogeneous buffers in a satisfying way yet. (E.g., variadic generics sound like something we would want to use.) It would be interesting to see how far we can get, though.
ManagedRawBuffer
would be a way to allow people to manually partition their buffer storage, in a more palatable way than rebinding memory vended by ManagedBuffer
. This would at least allow us to manually implement heterogeneous buffers without having to fight against the stdlib every step along the way.
But I think before we attempt to tackle these problems, we should spend some time to see how much we can fix in the APIs we already have. A strongly typed, flat ManagedBuffer
is not a useless construct -- flat buffers have many use cases. Its API is obviously deeply, deeply flawed because of some language-level constraints that don't seem easily resolvable, even today. But it also has some minor API wrinkles that are very much resolvable!
This is ~universally done by retrieving the size during initialization and then storing the actual usable capacity in the Header
. (I haven't seen any production code that uses ManagedBuffer.capacity
instead of this. If someone is doing that, I'm pretty sure they haven't thought it through.)
Making use of any extra "free" capacity needs to be an opt-in thing, because it isn't always desirable. In fact, the desire to use it seems to be the exception rather than the rule.
In practice, this would lead to the capacity being stored twice, which I don't think would be reasonable.