Data operations

In various cases I need to perform byte manipulation operations on Data and I would like to verify if my solutions to some patterns are valid and if there are simpler approaches.

Append from another Data, given an offset and size

extension Data {
    mutating func append(data: Data, offset: Int, size: Int) {
        // guard offset and size are valid ...
        data.withUnsafeBytes { buf in
            self.append(buf.bindMemory(to: UInt8.self).baseAddress!.advanced(by: offset), count: size)
        }
    }
}

Perform an operation on a portion of Data (e.g. SHA1 or compress)

func process(data: Data, offset: Int, size: Int) -> String {
    // guard offset and size are valid ...
    let segment: Data = data.withUnsafeBytes { buf in
        let mbuf = UnsafeMutablePointer(mutating: buf.bindMemory(to: UInt8.self).baseAddress!)
        return Data(bytesNoCopy: mbuf.advanced(by: offset), count: size, deallocator: .none)
    }
    return segment.sha1()
}

Copy from another Data

extension Data {
    mutating func copy(data: Data, size: Int) {
        self.withUnsafeMutableBytes { dstBuf in
            data.withUnsafeBytes { srcBuf in
                memcpy(dstBuf.baseAddress!, srcBuf.baseAddress!, size)
            }
        }
    }
}

Truncate to size

extension Data {
    func truncate(size: Int) {
        // guard size is valid ...
        self.removeLast(self.count - size)
    }
}

Move bytes to the front

extension Data {
    mutating func move(offset: Int) {
        // guard offset is valid ...
        let size = self.count - offset
        self.withUnsafeMutableBytes { buf in
            memmove(buf.baseAddress!, buf.baseAddress!.advanced(by: offset), size)
        }
        self.truncate(size: size)
    }
}

Ideally I'd like to do these things without using unsafe that much, but without doing unnecessary allocations. Thanks for any feedback!

Append from another Data, given an offset and size

I would do this as follows:

mutating func append(data: Data, offset: Int, size: Int) {
    let start = data.startIndex + offset
    let end = start + size
    self.append(data[start..<end])
}

There’s no point monkeying around with unsafe pointers if you don’t have to (-:

Also, with your current code you should watch out for size being 0, in which case baseAddress might be nil.

Perform an operation on a portion of Data (e.g. SHA1 or compress)

Your current code is definitely not safe. It effectively transports buf outside of the withUnsafeBytes(_:) closure, which is not allowed.

My recommended solution would be as above, that is, use a range to work on a subdata.

Copy from another Data

My first thought was to reach for subscript notation again:

mutating func copy(data: Data, size: Int) {
    let srcStart = data.startIndex
    let srcEnd = srcStart + size
    let dstStart = self.startIndex
    let dstEnd = dstStart + size
    self[dstStart..<dstEnd] = data[srcStart..<srcEnd]
}

but, depending on your exact needs, you might be better off calling replaceSubrange(_:with:).

Truncate to size

Setting data.count does this.

Move bytes to the front

You can do this with self.removeFirst(_:).


If there’s one thing that the Swift 5 changes to Data have revealed is that there’s lots of folks using unsafe pointers when they don’t need to be )-: Data has a pretty solid API, helped by the fact it’s a collection of bytes and thus has all of the standard collection constructs available to it.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

6 Likes

Totally aggreed, thanks for your answers!

I had assumed that using a range to get a subdata would do a copy. If it is referencing the original memory this is very nice and indeed solves a lot of my problems!

Can you elaborate on this? What's the difference between d1[a..<b] = d2[c..<d] and d1.replaceSubrange(a..<b, with: d2)? Would there be a performance difference?

Nice! I hadn't realized you can set count.

The use case I was thinking was to have a fixed size Data buffer that gets consumed from the beginning (by increasing an offset variable). When I need to add more data I shift the unconsumed bytes to the start, add more at the end and set offset to 0. The goal is to avoid allocations.

The goal is to avoid allocations.

Right, but you have to check your assumptions here. For example, it seems obvious that withUnsafe[Mutable]Bytes(_:) wouldn’t allocate, but that’s not necessarily the case. If the data is discontiguous, then asking for a pointer to a buffer will trigger an allocation, whereas more abstract mechanisms may not.

Speaking of discontiguous data, historically that was feasible, by bridging a dispatch_data_t to an NSData to a Data, but this post indicates that no longer happens. Which really brings me to my main point here: If you absolutely want to avoid allocations, none of this stuff will work for you, because the allocation pattern of these calls is not documented API but an implementation detail.

You seem to have two choices here:

  • You can continue using Data, making sure to use code paths that avoid allocations, and then profile your usage to check that your assumptions are valid.

  • You can switch to a type that has documented memory allocation behaviour.

With regards the second point, you might want to take a look at SwiftNIO’s ByteBuffer type. I’ve not used it myself, but from what I’ve read it seems to be well aligned with your requirements.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

2 Likes

Thanks, this is very useful!