Is it safe to cast DispatchData to Data this way?

Hi. I'm writing a program that used to use (NS)InputStream to read a file, and I'm upgrading it to use DispatchIO. WWDC 2013 #704 showed an example of how to do this, but of course that was pre-Swift. You pass an Objective-C block which accepts a dispatch_data_t.

In Swift, I've got DispatchData. The rest of my program expects Data. How do I convert this?

First, I tried the simplest possible solution: "my_dd as! Data". The compiler warns that this always fails, and that's fair. They're different types.

Then I found some random person on the internet doing "my_dd as AnyObject as! Data", and that seems to work for me so far. But DispatchData may be non-contiguous, and Data conforms to ContiguousBytes. Am I just getting lucky that my program/data/memory usage today happen to only be making contiguous blocks today?

There's a note in WWDC 2013 #205 that (in Objective-C) dispatch_data_t can be cast to NSData. There's a note in the Swift.Data docs that it's bridged to NSData, so you can use those two types "interchangeably". But I'm having trouble finding details on exactly how casting and bridging works -- DynamicCasting.md has "XXX TODO EXPLAIN" in the section I care about!

I noticed that "my_dd as Any as! Data" also seems to work. Since dispatch_data_t and NSData are (Objective-C) NSObjects, and DispatchData and Data are (Swift) structs, are these two casting chains equivalent, and simply doing the conversion after bridging, or vice versa?

Is "my_dispatch_data as AnyObject as! Data" safe? If not, what's the best way to do this?

Thanks!

At one time, dispatch_data_t could be cast to NSData in Obj-C (they are "toll-free bridged"), and of course NSData can be bridged to Data in Swift. That's why as Any as! Data seems to work.

I suspect that dispatch_data_t and NSData are still toll-free bridged in Obj-C, although I have a vague recollection that Apple backed away from recommending reliance on this a couple of years ago.

However, whatever the details, I suggest you stay away from trying to bridge like this. In Swift, Data was fairly recently revamped to not depend on NSData for its implementation. As a result, Data now requires an internal contiguous byte buffer. That means, if you bridged from dispatch_data_t into Data in Swift, the data would likely be copied as soon as you tried to access it (or sooner).

Having potentially large copies done at unpredictable times is bad news for your app's performance. If you've got a DispatchData object, you should probably copy it to a Data object at a controlled place of your choosing.

When copying, you probably want to avoid getting the entire contents of the DispatchData object as a single block of memory, because that would already be a copy of your discontiguous data (in general), and most ways of creating the Data object would make an extra copy.

A better way is probably to create an empty Data object, then iterate through the DispatchData's regions array and append each contiguous region to the Data object. That should hold things down to a single copy of the original contents.

I'm not privy to the underlying implementation details of DispatchData or Data, and I'm not certain that the above advice doesn't overlook some detail that matters. However, I think you're much better off avoiding as Any as! Data, unless someone better informed jumps in with better advice. :slight_smile:

Thanks for the input, QuinceyMorris!

It sounds like the answer is that I shouldn't make any assumptions at all, beyond what I read today in the API exactly.

So, something like this? I'm not up on all the possible ways to copy byte buffers.

for region in dispatchData.regions {
    var data = Data(count: region.count)
    data.withUnsafeMutableBytes { _ = region.copyBytes(to: $0) }
    // ...now pass this Data to the func that processes them...
}

In practice, I'm getting a sequence of DispatchData each with 1 region of 1 MiB. Reading a large file this way is faster than using NSInputStream, and about the same speed as my casting/bridging attempt.

I'd still be curious to hear from any compiler folks on the status of casting/bridging. Swift's type system is a bit of a mystery to me, and I'd like to know what the various "as X as Y" expressions actually do.

Personally, I'd prefer to do it the other way around, so that you aren't responsible for not running off the end of the destination buffer:

for region in dispatchData.regions {
    let data = region.withUnsafeBytes { Data($0) }
}

(If you actually needed to get all the regions into a single Data object, I think I'd recommend Data.append instead.)

I'm not up on all the possible ways to copy byte buffers.

You don’t need to iterate each region here; copyBytes(to:) will do that for you. This is what I’d do:

extension Data {

    init(copying dd: DispatchData) {
        var result = Data(count: dd.count)
        result.withUnsafeMutableBytes { buf in
            _ = dd.copyBytes(to: buf)
        }
        self = result
    }
}

I'd like to know what the various "as X as Y" expressions actually do.

I’ve put some implementation details below. However, your goal seems to be to force the copy to happen at a specific point, and I don’t think you can use implementation details to guarantee that. Hence my suggestion above.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

Some Implementation Details

I'd like to know what the various "as X as Y" expressions actually do.

That answer to that is “It depends.” In the specific case of Data, however, the compiler pretty much passes the work off to Foundation. A quick trip to the disassembler shows that the heavy lifting is done by Foundation.Data._unconditionallyBridgeFromObjectiveC(…). It’s implementation calls init(referencing:). That’s actually public API, although the docs leave a little be desired )-: However, the doc comments are useful:

/// Initialize a `Data` by adopting a reference type.
///
/// You can use this initializer to create a `struct Data` that
/// wraps a `class NSData`. `struct Data` will use the `class
/// NSData` for all operations. Other initializers (including
/// casting using `as Data`) may choose to hold a reference or not,
/// based on a what is the most efficient representation.
///
/// If the resulting value is mutated, then `Data` will invoke the
/// `mutableCopy()` function on the reference to copy the contents.
/// You may customize the behavior of that function if you wish to
/// return a specialized mutable subclass.
///
/// - parameter reference: The instance of `NSData` that you wish to
/// wrap. This instance will be copied by `struct Data`.

The internals of that routine (in the same file) are lots of fun (-:

2 Likes

As it turned out, it sounds like @ken prefers to have a Data per region, not a single combined one.

WWDC 2019: Advances in Foundation

01:58 Data Contiguity
03:53 Working with discontiguous data:

Not sure if this is helpful to you...

The dynamic casting and bridging implementation is, for better or worse, an absolutely enormous amount of code, split between C++ in the Swift runtime, and Swift in the standard library. If you want to dig into it I'm happy to point you to various relevant pieces and answer questions, but be prepared for it to be a lot of work.

@QuinceyMorris: Great, that looks even better!

@young: I watched that again. I think I've seen it before (when trying to figure out NSListFormatter). Nothing too surprising after what I've read here -- just a confirmation that Data is-a ContiguousBytes, and other DataProtocol types generally aren't.

@eskimo: Always good to learn new ways to deal with byte buffers. That's definitely not my forte.

Though I'm not sure what benefit there is to putting each DispatchData into one Data. Either you've got an interface that can deal with any size chunks of bytes (and you can feed it each Region), or you've got an algorithm that needs random access to the whole file (so you'll need to accumulate them all). You can't guarantee anything about the size of Regions you receive.

Anyway, it's a moot point today, since I've never seen DispatchIO ever return any DispatchData with more than 1 Region.

@David_Smith: This is going to sound wishy-washy, but I feel I still have no intuition for how types work in Swift. I've been using the language for 5 years and I can't explain what "as" does. Like in this case, why "x as Z" fails, when "x as Y as Z" works, or how that's different from "x as Y2 as Z". Or how these relate to their Objective-C types, or how many times they're copying data. Or how to make my own types run a function for "as", or if that's even possible.

Anyway, beyond the question in the subject line here, I don't know what to ask. (I'm not sure learning C++ is a great use of my time right now.) I'm just feeling a bit frustrated. If I come up with any other specific questions, I'll ask. Thanks.

You might be interested in this document describing the behavior of is, as? and as! (but not as)

3 Likes

Yup, Tim did a LOT of work uncovering all the "organic" behaviors of the casting system that have emerged over time and reifying them into something with actual structure. That document is a great place to start.

I'll be more useful when you get to the question "what actually happens inside those bridging calls?"

4 Likes