How to create a long-lived pointer from Data.withUnsafeBytes

I have a Data object that represents some serialized data. How do I create a pointer to the raw bytes that can be used over and over without having to always perform the work inside a call to withUnsafeBytes?

The documentation for withUnsafeBytes since the pointer passed to the closure should not be used outside of the closure. I am calling these read() functions potentially millions of times and Instruments suggests there's some overhead in constantly calling withUnsafeBytes. I was hoping I could improve that by "reusing" the pointer to the underlying bytes but I can't figure out the Swift syntax for that.

Alternatively, rather than relying on calls to .load(fromByteOffset:as:) which requires aligned memory, is there another way to read from the bytes in chunks or one byte increments? Using Data subscript[] works, but is it very efficient?

struct Reader {
  var data:Data
  var ptr:UnsafeRawBufferPointer // What type should this be?

  var offset = 0

  init(_ data:Data) {
    self.data = data
    self.ptr  = ?    // What can be used here? 
  }

  func readInteger() -> Int {
    let value = ptr.load(fromByteOffset:offset, as:Int.self)
    offset += ...

    return value
  }

  func readBool() -> Bool {
    let value = ptr.load(fromByteOffset:offset, as:Bool)
    offset += ...

    return value
  }
}

You can also allocate(byteCount:alignment:) memory separately, and copy data onto it.

Could be an interesting one to benchmark against.

Agreed, though instinctively it feels a little odd to have to make a copy out of a Data and into a separately allocated buffer. (These are buffers of potentially millions of scalar values being sent from an XPC process to the main app.)

Is there any way to accomplish this without having to copy the data out of Data and into second buffer?

Depending on the structure of your code, the generally recommended way to approach this would be to lower your loop into withUnsafeBytes, if possible, e.g., instead of

while /* condition */ {
    buffer.withUnsafeBytes { /* read next */ }
}

prefer

buffer.withUnsafeBytes {
    while /* condition */ {
        /* read next */
    }
}

The reason you can't easily get a long-lived pointer to the underlying buffer without copying it out yourself is that one might not actually exist. As an implementation detail, Data may store its bytes inline (or not at all in case of empty data), meaning the pointer value is truly temporary (e.g. you might get a pointer to data copied onto the stack); this is unlikely to apply to your case given the length of the data, but the safest way to ensure you have a long-lived pointer without making assumptions about implementation details is to make a full copy yourself.

1 Like

From what I know, you can't do that without copying the bytes to a newly allocated pointer in some specific cases. If I remember it correctly, for DataRepresentation of InlineData (less than 15 bytes in 64 system), every time you call withUnsafe*Pointer function on it, a new tuple will be created and the pointer of that new tuple will be used, not the original one's, so you can not get the pointer to the raw bytes in this case.

These are buffers of potentially millions of scalar values being sent
from an XPC process to the main app.

Have you looked at how that memory is copied during the IPC? XPC should use out-of-line memory for such a large buffer, but you’ll want to make sure that Data doesn’t make a copy of that. If it does, I think that’s the first thing you need to address. You may, for example, find that bridging back to NSData might be your best option here (and NSData has a bytes property).

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

No, not yet. And to be honest I'm not entirely sure how to use Instruments to measure that. The XPC process is written in Objective-C++ and uses a std::vector<uint8_t> as its buffer. The ultimate goal of this exercise is to efficiently send that buffer through XPC to the main app and then iterate through that buffer as raw bytes from within Swift.

The relationship between DispatchData , Data and NSData is still quite confusing to me.

I should probably move this part of the discussion to a separate post on the Apple Forums or Stack Overflow as it deals more with XPC and macOS technologies rather than Swift itself.

Erm, no, it absolutely isn't. Try this:

let reader = Reader { Data(intBytes + intBytes + intBytes + boolBytes) }

let int: Int = reader.read() /// 1234
let int2: Int = reader.read() /// 1234
let int3: Int = reader.read() /// 1234
let bool: Bool = reader.read() /// error: Execution was interrupted, reason: EXC_BAD_ACCESS

_getUnsafePointerToStoredProperties gives you the location of the data variable itself, within the instance of Container. That means you're writing over your class' instance variables (and beyond) with garbage data and corrupting them. It's purely accidental that your example (with a single Int and Bool) even works at all.

Briefly, you can think of a class as a heap-allocated struct with a secret "header" field containing various metadata, followed by the stored properties. _getUnsafePointerToStoredProperties knows where the stored properties are located within the class' layout and gives you that pointer.

2 Likes

Is this header the root class SwiftObject? I thought its ivar is listed after the Container.

I was thinking that the MemoryLayout.offset(of: keyPath) always returns nil for classes, so presumably I guess if the data is the sole property of a class, naturally its offset will be 0, but what I didn't thought of is that the pointer to data is not the pointer to the data's raw bytes.

Clearly I am terribly wrong. I apologize for the daydream thought.

No, it doesn't have anything to do with SwiftObject, which is just an Obj-C bridging thing AFAIK.

In Obj-C, the header is a pointer called isa ("is a"), which points to the class of the object. In C++ it's a vpointer, pointing to a virtual method table (vtable) to support dynamic dispatch. I don't actually know what the class header in Swift is - I think it's also a vpointer, but it could also be a pointer to the class' metadata object :man_shrugging:

In any case, it's an implementation detail that you don't need to know the specifics of. It's enough to be generally aware that classes have more complex layouts than structs.

The relationship between DispatchData, Data and NSData is still
quite confusing to me.

It’s not just you (-: This relationship is complex, and it’s evolved over time.

I should probably move this part of the discussion to a separate post
on the Apple Forums or Stack Overflow as it deals more with XPC and
macOS technologies rather than Swift itself.

Indeed. If you bounce on over to Core OS > Processes, I’d be happy to chat with you there.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

I just found out why it works for a single Int and Bool. Because they are only 9 bytes in total, so the data is represented as InlineData, which is a tuple.

This post is just a pointer from here to kennyc’s threads on DevForums:

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple