Pros and cons of Unmanaged, UnsafePointers and Data?

RussBaz · September 17, 2024, 6:33am

Hi. Can someone briefly explain how to pick one vs the other? Also, can anyone clarify some of the ARC terminology, such as balanced vs unbalanced retain?

As an example of what I am doing - I am creating a mutable byte buffer which is passed to a C library as a pointer for further processing.

Thanks.

eskimo · September 17, 2024, 8:21am

It’s hard to respond here without more info about the specific issue you’re wrangling. Consider this:

I am creating a mutable byte buffer which is passed to a C
library as a pointer for further processing.

The safest option here is to use one of the withXxx(…) methods. For example:

let d = Data("Hello Cruel World!".utf8)
d.withUnsafeBytes { buf in
    someCFunction(buf.baseAddress!, buf.count)
}

Note This example uses Data but the same technique works for [UInt8].

These withXxx(…) methods manage the lifetime of the unsafe pointer. As long as you don’t access the pointer after the closure has returned, you’re golden.

Where things get tricky is when you need to pass a buffer to C and have C maintain ongoing access to it. In that case the withXxx(…) methods aren’t appropriate and you have to start managing memory manually.

If that’s the case, please post more details about what you’re doing and we can explore your options.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

RussBaz · September 17, 2024, 8:51am

To be honest, it is the latter case. I am passing a buffer around for reading/writing in a callback oriented system (libuv).

Well, I have a potential solution in mind that does not involve manual memory management.

(Create a container class with a buffer and a key on every new connection. Store this class in a 'global' array and pass a pointer to this object around. Once the 'close' callback is finished, remove the container class by key from the array.)

However, I am still curious at how a manual memory management should be done in this case. It might result in a more efficient code too if done well.

Furthermore, I am just wondering why would one prefer one type over the other. For example, UnsafeMutableRawPointer vs Data? Or Data vs Unmanaged? And lastly, passRetained vs passUnretained?

eskimo · September 18, 2024, 9:02am

UnsafeMutableRawPointer vs Data?

These offer completely different semantics:

UnsafeMutableRawPointer is roughly equivalent to a C buffer pointer and length. It’s pretty much completely unsafe [1].
Data is a value type that uses CoW under the covers.

Or Data vs Unmanaged?

Unmanaged is a mechanism to work with reference counts directly. Most Swift code uses automatic reference counting (ARC), which means the compiler takes care of the reference counts for you. Unmanaged is useful in situations where you’re escaping the ARC world in order to interoperate with manual retain/release environments, like a C library.

You use passRetained(…) when you need to pass an unmanaged value to C (for example) and you want to increment the retain count on that value. This is useful for escaping contexts, with the obvious caveat that you have to decrement the retain count at some point.

You use passUnretained(…) when you need to pass an unmanaged value to C and don’t want to retain that value. This is useful for non-escaping contexts.

There are similar takeRetainedValue() and takeUnretainedValue() routines for when C is passing you an unmanaged value and you want to bring it back into the ARC world.

Finally:

You might find Objective-C Memory Management for Swift Programmers helpful.
There’s of official documentation about this stuff, all under the Language Interoperability with Objective-C and C topic group.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

[1] Except that it does bounds checking.

Harshal_Goyal · March 7, 2025, 6:23am

Hi @eskimo,

I went through the links you provided and now have a clearer understanding of the usage patterns for Unmanaged, passRetained, passUnretained, takeRetainedValue, and takeUnretainedValue.

Building on the OP's suggestion:

QUOTE
"Well, I have a potential solution in mind that does not involve manual memory management.
(Create a container class with a buffer and a key for each new connection. Store this class in a 'global' array and pass a pointer to this object around. Once the 'close' callback is finished, remove the container class by key from the array.)"
UNQUOTE

What if, instead of using a global array, I increase the reference count of this class object with Unmanaged.passRetained()? Then, I could get an opaque pointer and store it in C/C++ code. Once my C/C++ logic has finished processing (which might be asynchronous), I can release the object using takeRetainedValue().

As mentioned in a few other posts and also highlighted by @John_McCall in his latest sessions on "Explore Swift performance" in WWDC 2024 making a copy for value types is Expensive!! So, does this in any way help me achieve better performance?

Would love your thoughts on this.

eskimo · March 7, 2025, 8:50am

My general response to this, and to the other things we’ve been talking about over on DevForums, is that you’re spending a lot of time, and adding a lot of unsafe code, in the name of performance. Are you sure that’s worth it?

Modern computers are pretty darned fast, so unless you’re handling dozens of network connections at once the CPU is unlikely to be the bottleneck here. If I were in your shoes I’d create a prototype using the simplest possible code, measure the performance, and then decide whether all this work is actually worth it.

Regarding your suggestion, however, sure, if you want to manage memory manually, you can absolutely do that with Unmanaged. Like most unsafe constructs, it’s perfectly safe if you use it correctly (-:

Two things:

I can release the object using takeRetainedValue().

That’ll work, but the release() method is probably clearer.

I could get an opaque pointer and store it in C/C++ code.

Retaining an opaque pointer to an object like this is just fine. You have to be careful with buffers though. For example, with Data, you can’t escape the pointer from withUnsafeBytes(_:) regardless of what guarantees you have about the lifetime of the data value.

If you want to manually manage a data buffer, use lower-level constructs. If you then want to ‘steal’ a buffer from your C++ code and promote it to Data, use the no-copy initialiser (although follow the rules we talked above on or your DevForums thread).

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

Harshal_Goyal · March 7, 2025, 10:33am

@eskimo
I agree with your suggestion to use the no-copy initializer for all my sends. Considering my usage pattern and the high traffic volume I expect (millions of packets per second, each around 1500 bytes), my plan is to queue and construct the entire message over UDP and then process all the memory at once on the receiving side. Since Apple creates its own buffer and sends the data in that buffer, would it be better to store the data (all these 1500-byte chunks, given that any message I send will be packetized based on MTU and won’t exceed 1500 bytes) within the class, or should I copy it into my C++ memory?

Note: While I’d understand and advantages of processing everything in Swift, I’m developing a cross-platform app, and the receive logic is implemented in C++. Therefore, I need to access the underlying memory and process it in C++. I want to keep all the processing logic in one place—C++—which is why I’m spending time understanding and implementing these solutions. My goal is to maintain a single, sustainable processing logic across the app.

Harshal_Goyal · March 7, 2025, 11:14am

Also, @eskimo

What if I implement something like this:


public func RecvFunc() -> Void {
   connection.receive(..., completion: { (data, ....) in
      var myClassData = MyWrapperClass(data)
      CppMethodToStoreRetainedPointerOfClass(Unmanaged<MyWrapper>.takeRetainedValue(myClassData).toOpaque())
   })
}

and at later point in time i invoke below method from cpp:

public func SwiftFunc(_ pClassObject: UnsafeMutableRawPointer) -> Void {
   var wrapper = Unmanaged<MyWrapper>.fromOpaque(pClassObject).takeUnretainedValue()
   var numBytesReceived = wrapper.data.count
   
   wrapper.data.withUnsafeBytes { buf in
      StaticCppMethodToProcessDataMemory(buf.baseAddress, numBytesReceived)
   }
}

Would this approach work and avoid copying the data in my buffer without violating any Swift memory management rules or mechanics?

eskimo · March 10, 2025, 9:26am

Considering my usage pattern and the high traffic volume I
expect (millions of packets per second, each around 1500
bytes)

Given your use case, I recommend you go straight to the Dispatch data receive functions, that is:

receiveDiscontiguous(minimumIncompleteLength:maximumLength:completion:)
receiveMessageDiscontiguous(completion:)

These yield DispatchData values, which are the Swift equivalent of dispatch_data_t. You can pass these to C++ which has then access the bytes directly via a pointer; see dispatch_data_apply.

It’s important to note this quote from the doc comments in <dispatch/data.h>:

Each invocation of the block is passed a data object representing the current region and its logical offset, along with the memory location and extent of the region. These allow direct read access to the memory region, but are only valid until the passed-in region object is released.

The upshot of this is that it’s possible for your C++ code to keep the buffer pointer valid indefinitely, as long as it returns the region object.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

Harshal_Goyal · March 10, 2025, 10:47am

Thanks @eskimo,

To follow up on your suggestion, my requirement is to modify the buffer inline, ensuring that I do not shrink or expand the underlying buffer.

From what I understand, each region I receive for processing in the dispatch_data_apply method is essentially a temporary copy of each contiguous region on which my logic is invoked.

Could you provide some guidance on how to handle this scenario effectively?

eskimo · March 11, 2025, 9:07am

From what I understand, each region I receive for processing
in the dispatch_data_apply method is essentially a temporary
copy …

No, that’s not right. In most cases (maybe even all) it’s a pointer to the buffer being managed by the Dispatch data. The only thing temporary about it is that the pointer is only guaranteed to remain valid while you hold a reference to region.

Consider this program:

@import Foundation;

int main(int argc, char **argv) {
    const void * original = calloc(1, 1024);
    fprintf(stderr, "original: %p\n", original);
    dispatch_data_t data = dispatch_data_create(original, 1024, nil, DISPATCH_DATA_DESTRUCTOR_FREE);
    dispatch_data_apply(data, ^bool(dispatch_data_t region, size_t offset, const void * buffer, size_t size) {
        fprintf(stderr, "buffer: %p\n", buffer);
        return true;
    });
    
    return EXIT_SUCCESS;
}

It prints the same value for original and buffer.

The buffer is, of course, immutable.

The idea behind Dispatch data is that it avoids any copies in the receive path of the networking stack [1]. If you need to mutate the buffer, you have to make a copy because you can’t go around mutating the networking stack’s buffers.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

[1] Except the kernel to user space copy if Network framework is using BSD Sockets.

Harshal_Goyal · March 11, 2025, 10:11am

Thanks for clearing that out @eskimo

I was confused from the below statement in the document:

QUOTE
For each contiguous memory region, this function creates a temporary dispatch data object and passes it to the specified applier function. This new object, plus the other parameters to the block, provide direct access to the specific memory region being examined. Once the applier block returns, the temporary dispatch data object is released. (The original object in the data parameter is not touched.)
UNQOUTE
Link: dispatch_data_t | Apple Developer Documentation

can you please help me understand what was meant here?

eskimo · March 12, 2025, 8:37am

I think you’re being confused by the ‘temporary object’ terminology. Remember that Dispatch data is immutable, and you can make a temporary copy of a value simply by incrementing its retain count. So imagine a Dispatch data value that has a single range. The implementation of the apply function would be something like this:

increment retain count
get buffer address and size
pass self, 0, buffer, size to callback
decrement retain count

In this design:

By default, buffer is no longer guaranteed to persist after the callback returns.
But if the callback retains region, then buffer will persist as long as it maintains that reference.

Now, the actual implementation gets way more complex than this once you start taking into account the full flexibility of Dispatch Data — things like dispatch_data_create_concat and dispatch_data_create_subrange — but I think it’s a reasonable model to keep in your head.

Also, don’t forget dispatch_data_create_map, which has a similar sort of model, where it returns a pointer and size and a ‘new’ object, guaranteeing that the pointer remains valid until the new object is released. In that case, if the data started out discontiguous, it may well end up creating a new object to keep track of the flattened memory.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

Harshal_Goyal · March 12, 2025, 9:04am

Thank you, @eskimo!

I really appreciate the clarity you provided around the handling of Dispatch Data, its immutability, and the lifecycle of buffers in relation to retain counts.

Your breakdown of the apply function and the nuances of how buffers persist (or don't) based on whether the callback retains the region was especially helpful. I'll keep your points in mind, especially when dealing with more complex scenarios like dispatch_data_create_concat, dispatch_data_create_subrange, and dispatch_data_create_map`