Performing an unbalanced retain on an Array

is there a way to safely convert a [UInt8] array to a tuple of (UnsafeRawBufferPointer, AnyObject?), where the object pointer keeps the buffer allocation alive? Array<UInt8> itself is not a class type, so i want to retain a reference to its underlying storage, if it has any. (not a box that retains an array value)

to goal is to get some sort of neutral ABI that can abstract over [UInt8] and ByteBuffer.withUnsafeReadableBytesWithStorageManagement(_:) and does not have the overhead of a generic RandomAccessCollection<UInt8>.

I guess you want to do this without an extra allocation?

What would you expect to happen if the source array is changed?

var items: [UInt8] = ...
let (pointer, objectKeeper) = items.xxxx()
items.append(...)

Not sure if this is a good idea: (items as NSArray) and grab the underlying NSArray buffer.

if the source array is mutated, it should COW, same as any other time an array has a second reference.

2 Likes

Array does not have this interface, and there's no real way you can build it without that interface.

well, that's a problem because without a neutral ABI for Array and ByteBuffer, we need to use some RandomAccessCollection<UInt8>, and that means the whole BSON library has to be sprayed with @inlinable. which worked for a while, but it has gotten to the point where it is seriously impacting compilation speed.

i see few alternatives to falling back to totally unsafe raw buffer pointers.

If anybody asks, you didn't hear it from me. :shushing_face:

As with all underscored APIs, it is not overly generous when it comes to making documented guarantees. In practice, I believe retaining the returned owner will be detected as sharing by COW (so it will still protect you against overlapping writes and enforce value semantics on the Swift side), but Array does lots of magic that isn't available to other Swift types, so it's hard to know for sure.

I think it would be cool to have a (pointer, owner) abstraction for sharing contiguous buffers without copying. Ideally generics would make things agnostic to specific data types, but in practice lots of things are still written in terms of Array, or String, and it would be nice to pass data to them without copying.

2 Likes

Nothing stops Array from having a representation that is neither an object nor static memory (except perhaps on Apple platforms, where its overall representation is partially locked-down); perhaps a compact inline representation, like String. Even the interface Karl found is used only for & in practice, and may in fact do an allocation when used.

NSArray would work for this, and even be cheap…if your elements were already objects. No good here.


That said, I think it’s very unlikely that Array, carefully squished to fit multiple representations in a single machine word, will have a representation like the one I described above. So you could indeed propose the addition of this API instead, along with the implicit limitations about Array that it requires, as an evolution proposal.

For now, there’s a chance you can get pretty far with withContiguousStorageIfAvailable and storing offsets instead of pointers in your intermediate types, but I haven’t looked specifically at your code, so I don’t know for sure.

the BSON code is open-sourced, the types look like:

extension BSON
{
    /// A BSON document. The backing storage of this type is opaque,
    /// permitting lazy parsing of its inline content.
    @frozen public
    struct DocumentView<Bytes> where Bytes:RandomAccessCollection<UInt8>
    {
        /// The raw data backing this document. This collection *does not*
        /// include the trailing null byte that typically appears after its
        /// inline field list.
        public 
        let slice:Bytes

        /// Stores the argument in ``slice`` unchanged.
        ///
        /// >   Complexity: O(1)
        @inlinable public
        init(slice:Bytes)
        {
            self.slice = slice
        }
    }
}

it is common when decoding BSON to escape an unparsed DocumentView<some RandomAccessCollection<UInt8>>, in fact that is the pretty much the point of using BSON instead of JSON - you can skip decoding things you don’t care about, or more commonly, don’t know how to decode because you haven’t modeled the schema, or want to delegate the decoding to some component that does know how to decode it.

when the BSON types are specialized, the Bytes parameter is almost always one of:

  • ArraySlice<UInt8>, which is three pointers long
  • ByteBuffer, which is also three pointers long

so it is really motivating to me to get rid of the generics entirely and store a raw buffer pointer (2 pointers long) + an object reference (1 pointer long).