As uncovered in PR #31686, the API of ManagedBuffer (and it's less-liked sibling ManagedBufferPointer) implicitly assume that it's possible to easily retrieve the size of a dynamically allocated chunk of memory. Platforms often allow this through functions like malloc_size, malloc_usable_size or _msize -- but it turns out not all of them do. In particular, OpenBSD evidently doesn't support this.
This makes these stdlib APIs suboptimal -- on OpenBSD, the implementation would need to bend over backwards store the allocated capacity within the class instance, even though it is practically always duplicated in Header. The extra storage would waste memory and complicate element access in every ManagedBuffer instance, even in the (hopefully) vast majority of cases where ManagedBuffer.capacity is never called outside of the closure passed to the create method.
I believe the stdlib's APIs shouldn't be needlessly difficult to implement on particular platforms. ManagedBuffer's implicit requirement for a working malloc_size seems unreasonable.
This seems like as good an excuse as any to start discussing a revision of these APIs.
Here is a list of problems I have run across while using the current APIs over the years:
-
(This is the motivating problem above.) It isn't possible to efficiently implement
ManagedBufferon some platforms, like OpenBSD. -
ManagedBuffer.createworks by passing a half-initialized object to its closure, and hoping that the closure will only access members that happen to work properly in this invalid state. This leaves me with a distinctly unclean feeling every time I have to call it -- Swift code should never need to deal with half-initialized class instances. -
The return type of the
ManagedBuffer.createclass function is declared as the concrete typeManagedBuffer<Header, Element>rather thanSelf, even though the function is documented to create a "most-derived" class instance. This forces us to manually insert a dynamic downcast after practically every single invocation of it. -
ManagedBufferis designed to be subclassed, but it is polluting the namespace of its custom subclasses with public member names that could just as well be left available. (By e.g. moving them to a utility struct likeManagedBufferPointer. In fact, look, they're already duplicated there!)This is especially problematic in the case of
capacity, which would often be a desirable member name for a custom buffer.ManagedBuffer'sfinaldefinition for it relies onmalloc_size, which makes it "nontrivial to compute", and thus inappropriate for use in high-performance code. Managed buffer instances often end up with at least two capacity properties,capacityandheader.capacity. Using the wrong one leads to sadness and bugs. (It doesn't help that the one less often useful has the more attractive name!) -
We also need a
ManagedRawBuffer, patterned afterUnsafeMutableRawBufferPointer. -
The name
ManagedBufferarguably needs anUnsafeprefix, as this construct requires manual memory management. For example,ManagedBufferdoesn't by itself know how to deinitialize its tail-allocated storage -- this must be very carefully done in a subclass.
Problem #4 would probably deserve a high-level proposal that could also complete the lower-level raw pointer APIs into a well-integrated subsystem. I think the ship has sailed for #5 at this point, unless we want to deprecate the entire ManagedBuffer API family and replace them with a far better design, which seems unlikely.
But I believe we can tweak things to resolve #0, #1, #2, and #3. First of all, we should consider deprecating the create function we have now:
extension ManagedBuffer {
@available(*, deprecated: ...)
public final class func create(
minimumCapacity: Int,
makingHeaderWith factory: (ManagedBuffer<Header, Element>) throws -> Header
) rethrows -> ManagedBuffer<Header, Element> { ... }
}
Instead, we should introduce the pair of functions below:
extension ManagedBuffer {
/// Create a new instance of the most-derived class, calling `headerFactory`
/// with a buffer pointer to the storage space that will become part of the
/// returned class instance.
///
/// The buffer's `count` reflects the actual capacity that was allocated.
/// Depending on platform-dependent allocation behavior, this may sometimes be
/// larger than requested. (On some platforms, determining the exact
/// allocation size may incur nontrivial costs; if you can't make use of such
/// "free" storage space, then it's better to call the
/// `create(exactCapacity:headerFactory:)` variant instead.)
///
/// The factory closure is allowed to initialize items in the buffer; however,
/// if it needs to throw an error, then it must return the buffer to its
/// default uninitialized state before doing so.
public final class func create(
minimumCapacity: Int,
headerFactory: (UnsafeMutableBufferPointer<Element>) throws -> Header
) rethrows -> Self { ... }
/// Create a new instance of the most-derived class, calling `headerFactory`
/// with a buffer pointer to the storage space that will become part of the
/// returned class instance.
///
/// The buffer's `count` is guaranteed to be equal to `exactCapacity`, even if
/// the allocation resulted in more storage space than requested.
///
/// The factory closure is allowed to initialize items in the buffer; however,
/// if it needs to throw an error, then it must return the buffer to its
/// default uninitialized state before doing so.
public final class func create(
exactCapacity: Int,
headerFactory: (UnsafeMutableBufferPointer<Element>) throws -> Header
) rethrows -> Self { ... }
}
As far as I can judge, these functions are exactly as powerful as the original create(minimumCapacity:makingHeaderWifth:), but they are far more convenient (and less dangerous) to use.
(ManagedBufferPointer's initializer will need to be similarly updated. This is left as an exercise to the reader for now.)
To fix the largest name pollution issue as well as the portability problem, we should consider deprecating ManagedBuffer.capacity and ManagedBufferPointer.capacity with no replacements:
extension ManagedBuffer {
@available(*, deprecated)
public final var capacity: Int { ... }
}
extension ManagedBufferPointer {
@available(*, deprecated)
public final var capacity: Int { ... }
}
I suggest leaving the rest of the ManagedBuffer members as is to minimize source compatibility issues.
Speaking of source compatibility -- the new create(minimumCapacity:headerFactory:) method would introduce an ambiguity for typical use sites that use the trailing closure syntax:
.../swift/stdlib/public/core/Hashing.swift:135:18: error: ambiguous use of 'create(minimumCapacity:makingHeaderWith:)'
let buffer = self.create(minimumCapacity: hashTable.bucketCount) { _ in
^
.../swift/stdlib/public/core/ManagedBuffer.swift:62:27: note: found this candidate
public final class func create(
^
.../swift/stdlib/public/core/ManagedBuffer.swift:85:27: note: found this candidate
public final class func create(
^
Suggestions to work around / fix this would be most welcome! E.g., can we find another name for the new variant to prevent a clash?
Do you think these problems would be worth fixing? Like and comment below, etc. etc.