Idea: Bytes Literal

Karl · February 8, 2021, 5:21pm

I think there are kind of 2 discussions happening here - one about data literals (which presumably would look/work something like StaticString, support @compilerEvaluable when that's a thing, etc), and another about our general handling of owned data.

WRT the first discussion: it's good. We should do it.

WRT the second discussion:

lorentey:

I agree! However, the bucket-o-bytes type I have in mind is only intended to go the other way -- it would be able to work with whatever storage representation you have, but there is no expectation that other types would directly adopt it as a storage representation.
struct ByteBucket {
  let owner: AnyObject?
  let buffer: UnsafeRawBufferPointer
}
Note how this is just an UnsafeRawBufferPointer that maintains an owning reference to its storage.

It's interesting - I have quite a pressing need for something like this, so I'd be happy to help in any way I can. Similar issues came up in the shared string thread, and in discussion I've had with others about how to drive that feature forward. I also kind of touched on that topic when Array.init(unsafeUninitializedCapacity:, initializingWith:) was proposed, and again when the string version was proposed -- so it's something that has irritated me a bit over the years.

IMO, I should be able to go from ManagedBuffer<Header, UInt8> -> Array<UInt8> -> String without copying. At least for read-only use cases.

A simple (pointer, owner) pair object would be really great as a starting point, to represent "some pointer kept alive by an ARCed object". I wonder if the owner object could be anything more than a dumb ARCed thing, and maybe conform to protocols which we could as? cast to enable more functionality, such as checking for uniqueness or setting the count of initialized elements.

Most use-cases I have would require the buffer to be typed, though. For shared strings, I think we'd probably want to use a buffer of UInt8s rather than a raw buffer (UInt8 of course being the UTF8 codepoint type). Unfortunately this would lead to another RawByteBucket/ByteBucket<T> split

lukasa:

We have a common currency type: UnsafeMutableRawBufferPointer . As a sketch, we could imagine the protocol being this:
protocol AppendableRawBuffer: ContiguousBytes {
    mutating func withUnsafeUnitializedTrailingBytes<ReturnType>(minimumCapacity: Int, _ block: (UnsafeMutableRawBufferPointer, inout Int) throws -> ReturnType) rethrows -> ReturnType

    var count: Int { get }
}
This is probably the minimal viable feature set required to serialise data into a buffer. It's a bit of a pain to perform some operations (e.g. back-to-front serialization), but a basic forward-moving serialisation can be cheaply performed using this abstraction.

We could definitely use better APIs for inserting contiguous data in to contiguous collections! I had a need for something like this recently (for a similar reason; I'm simplifying a path string and serialising the result in a generic container, starting at the last path component and working towards the front), so I've been using the following family of functions. They've been incredibly useful.

/// Appends space for the given number of objects, but leaves the initialization of that space to the given closure.
///
/// - important: The closure must initialize **exactly** `uninitializedCapacity` elements.
///
mutating func unsafeAppend(
  uninitializedCapacity: Int, 
  initializingWith initializer: (inout UnsafeMutableBufferPointer<Element>) -> Int
)

mutating func unsafeReplaceSubrange(
  _ subrange: Range<Index>,
  withUninitializedCapacity newSubrangeCount: Int,
  initializingWith initializer: (inout UnsafeMutableBufferPointer<Element>) -> Int
)

Basically, it inserts some uninitialised capacity at the given place, and gives you a closure to initialise it out-of-order, in a similar fashion to Array.init(unsafeUninitializedCapacity: initializingWith:).

(The Int return value is supposed to be an independently-calculated version of how many elements the closure actually wrote. The implementation traps if you fail to initialise the entire inserted capacity).