Getting at the bytes of Data

I am writing a tutorial and creating a wrapper on streaming compression. In the code, I setup the stream source pointer: (input is of type Data)

  // set up the stream with the input source
  input.withUnsafeBytes { (bytes: UnsafePointer<UInt8>) in
    stream.src_ptr = bytes
  }
  stream.src_size = input.count

Input is guaranteed to be alive for the entire lifetime of the stream. Everything seems to work fine but I thought I remember reading somewhere that it is bad form to let the bytes escape out of the withUnsafeBytes closure. Anyone know if I'm I dipping into undefined behavior here?

Thanks in advanced,
Ray

So letting the bytes escape from the closure could potentially be in that territory of “bad things may happen”. If you know for certain that the Data will not be mutated and it’s underlying storage will last for the duration of the usage then you should be able to get away with it.

The bytes pointer is an inner pointer return from the encapsulating object backing the struct Data. Since that backing object can potentially be deallocated via mutation or the buffer may be reallocated (which very often can cause the base address of the pointer to change) we decided it would be safer to use a closure to ensure the lifespan of the bytes over the access to the bytes.

Perhaps it might be best to understand more of what you are trying to do to offer a better solution.

···

On Dec 9, 2016, at 8:39 AM, Ray Fix via swift-users <swift-users@swift.org> wrote:

I am writing a tutorial and creating a wrapper on streaming compression. In the code, I setup the stream source pointer: (input is of type Data)

  // set up the stream with the input source
  input.withUnsafeBytes { (bytes: UnsafePointer<UInt8>) in
    stream.src_ptr = bytes
  }
  stream.src_size = input.count

Input is guaranteed to be alive for the entire lifetime of the stream. Everything seems to work fine but I thought I remember reading somewhere that it is bad form to let the bytes escape out of the withUnsafeBytes closure. Anyone know if I'm I dipping into undefined behavior here?

Thanks in advanced,
Ray

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users

Hi Philippe,

Thank you for your response!

So letting the bytes escape from the closure could potentially be in that territory of “bad things may happen”. If you know for certain that the Data will not be mutated and it’s underlying storage will last for the duration of the usage then you should be able to get away with it.

So far it is looking like I am getting away with it. I believe the conditions that you state hold.

The bytes pointer is an inner pointer return from the encapsulating object backing the struct Data. Since that backing object can potentially be deallocated via mutation or the buffer may be reallocated (which very often can cause the base address of the pointer to change) we decided it would be safer to use a closure to ensure the lifespan of the bytes over the access to the bytes.

Perhaps it might be best to understand more of what you are trying to do to offer a better solution.

I am trying to use the compression_stream C API in the Compression framework and wrap it so it is a bit more Swifty. :]

A full gist of the playground is here: Compression.swift · GitHub

See line 91.

Appreciate the hints.

Ray

···

On Dec 9, 2016, at 8:44 AM, Philippe Hausler <phausler@apple.com> wrote:

What you have is probably ok in common usage; but there is a chance that it would not properly do a copy on write change as you expect if the Data was somehow obtained from a multi-threaded environment. Here is a slight refactor (with a few extra fixes and comments for you that might save you some time later)

import Foundation
import Compression

public enum CompressionAlgorithm {
    case lz4 // speed is critical
    case lz4a // space is critical
    case zlib // reasonable speed and space
    case lzfse // better speed and space
}

private enum CompressionOperation {
    case compression
    case decompression
}

private func perform(_ operation: CompressionOperation,
                     on input: Data,
                     using algorithm: CompressionAlgorithm,
                     workingBufferSize: Int = 2000) -> Data? {
    
    // set the algorithm
    let streamAlgorithm: compression_algorithm
    switch algorithm {
    case .lz4: streamAlgorithm = COMPRESSION_LZ4
    case .lz4a: streamAlgorithm = COMPRESSION_LZMA
    case .zlib: streamAlgorithm = COMPRESSION_ZLIB
    case .lzfse: streamAlgorithm = COMPRESSION_LZFSE
    }
    
    // set the stream operation, and flags
    let streamOperation: compression_stream_operation
    let flags: Int32
    switch operation {
    case .compression:
        streamOperation = COMPRESSION_STREAM_ENCODE
        flags = Int32(COMPRESSION_STREAM_FINALIZE.rawValue)
    case .decompression:
        streamOperation = COMPRESSION_STREAM_DECODE
        flags = 0
    }

    let dstSize = workingBufferSize
    let dstPointer = UnsafeMutablePointer<UInt8>.allocate(capacity: dstSize) // you could technically use a Data for this too by accessing the mutable bytes ;)
    defer {
        dstPointer.deallocate(capacity: dstSize)
    }
    
    let inputCount = input.count
    return input.withUnsafeBytes { (bytes: UnsafePointer<UInt8>) in // the closure here can return a Data so the entire compression operation can be encapsulated in the lambda
        var output = Data()
        var stream = compression_stream(dst_ptr: dstPointer, dst_size: dstSize, src_ptr: bytes, src_size: inputCount, state: nil)
        var status = compression_stream_init(&stream, streamOperation, streamAlgorithm)
        guard status != COMPRESSION_STATUS_ERROR else {
            return nil // perhaps instead of making this nullable a throw?
        }
        defer {
            compression_stream_destroy(&stream)
        }
        
        while status == COMPRESSION_STATUS_OK {
            // process the stream
            status = compression_stream_process(&stream, flags)
            
            // collect bytes from the stream and reset
            switch status {
            case COMPRESSION_STATUS_OK:
                output.append(dstPointer, count: dstSize)
                stream.dst_ptr = dstPointer
                stream.dst_size = dstSize
                break // I am certain you probably meant to break here?
            case COMPRESSION_STATUS_ERROR:
                return nil // perhaps instead of making this nullable a throw?
            case COMPRESSION_STATUS_END:
                output.append(dstPointer, count: stream.dst_ptr - dstPointer)
                break // here too, this should break right?
            default:
                break // this probably should be a fatal error since it is an unexpected result.
            }
        }
        return output
    }
}

// Compressed keeps the compressed data and the algorithm
// together as one unit, so you never forget how the data was
// compressed.
public struct Compressed {
    public let data: Data // this was what was saving your state of the pointer
    public let algorithm: CompressionAlgorithm
    
    public init(data: Data, algorithm: CompressionAlgorithm) {
        self.data = data
        self.algorithm = algorithm
    }
    
    // Compress the input with the specified algorith. Returns nil if it fails.
    public static func compress(input: Data,
                                with algorithm: CompressionAlgorithm) -> Compressed? {
        guard let data = perform(.compression, on: input, using: algorithm) else {
            return nil
        }
        return Compressed(data: data, algorithm: algorithm)
    }
    
    // Factory method to return uncompressed data. Returns nil if it cannot be decompressed.
    public func makeDecompressed() -> Data? {
        return perform(.decompression, on: data, using: algorithm)
    }
}

// For discoverability, add a compressed method to Data
extension Data {
    // Factory method to make compressed data or nil if it fails.
    public func makeCompressed(with algorithm: CompressionAlgorithm) -> Compressed? {
        return Compressed.compress(input: self, with: algorithm)
    }
}

// Example usage:
let input = Data(bytes: Array(repeating: UInt8(123), count: 10000))

let compressed = input.makeCompressed(with: .lzfse)
compressed?.data.count // in most cases much less than orginal input count
let restoredInput = compressed?.makeDecompressed()
input == restoredInput // true

···

On Dec 9, 2016, at 9:04 AM, Ray Fix <rayfix@gmail.com> wrote:

Hi Philippe,

Thank you for your response!

On Dec 9, 2016, at 8:44 AM, Philippe Hausler <phausler@apple.com <mailto:phausler@apple.com>> wrote:

So letting the bytes escape from the closure could potentially be in that territory of “bad things may happen”. If you know for certain that the Data will not be mutated and it’s underlying storage will last for the duration of the usage then you should be able to get away with it.

So far it is looking like I am getting away with it. I believe the conditions that you state hold.

The bytes pointer is an inner pointer return from the encapsulating object backing the struct Data. Since that backing object can potentially be deallocated via mutation or the buffer may be reallocated (which very often can cause the base address of the pointer to change) we decided it would be safer to use a closure to ensure the lifespan of the bytes over the access to the bytes.

Perhaps it might be best to understand more of what you are trying to do to offer a better solution.

I am trying to use the compression_stream C API in the Compression framework and wrap it so it is a bit more Swifty. :]

A full gist of the playground is here: Compression.swift · GitHub

See line 91.

Appreciate the hints.

Ray

Keeping the closure open until the method ends and returning the result was eye opening. Thanks again!

I thought about making the destination buffer a Data as well but I kind of feel like it is better the way it is as it keeps down the number of scopes. It kind of starts feeling like a pyramid of doom.

The example you sent back doesn’t actually work for a subtle reason. I think what is happening is compression_stream_init resets all of the pointers and sizes back to zero. It feels strange to init, call compression_stream_init and then re-assign which is why I did things the way I did.

Since Swift doesn’t require break statements, I usually omit them. Watch out when I get back into the C/C++ code!

Good call on adding fatalError to default. I wish compression_status was a finite enum so I could get a compiler error instead.

The error information coming back from compression_stream module is rather lacking is which is why I am just returning nil instead of throwing. Also, that, and this is for a tutorial so I am trying to keep the number of concepts to a minimum (within reason anyway).

Anyway, I updated the gist here:

With great thanks,
Ray

···

On Dec 9, 2016, at 10:00 AM, Philippe Hausler <phausler@apple.com> wrote:

What you have is probably ok in common usage; but there is a chance that it would not properly do a copy on write change as you expect if the Data was somehow obtained from a multi-threaded environment. Here is a slight refactor (with a few extra fixes and comments for you that might save you some time later)

import Foundation
import Compression

public enum CompressionAlgorithm {
    case lz4 // speed is critical
    case lz4a // space is critical
    case zlib // reasonable speed and space
    case lzfse // better speed and space
}

private enum CompressionOperation {
    case compression
    case decompression
}

private func perform(_ operation: CompressionOperation,
                     on input: Data,
                     using algorithm: CompressionAlgorithm,
                     workingBufferSize: Int = 2000) -> Data? {
    
    // set the algorithm
    let streamAlgorithm: compression_algorithm
    switch algorithm {
    case .lz4: streamAlgorithm = COMPRESSION_LZ4
    case .lz4a: streamAlgorithm = COMPRESSION_LZMA
    case .zlib: streamAlgorithm = COMPRESSION_ZLIB
    case .lzfse: streamAlgorithm = COMPRESSION_LZFSE
    }
    
    // set the stream operation, and flags
    let streamOperation: compression_stream_operation
    let flags: Int32
    switch operation {
    case .compression:
        streamOperation = COMPRESSION_STREAM_ENCODE
        flags = Int32(COMPRESSION_STREAM_FINALIZE.rawValue)
    case .decompression:
        streamOperation = COMPRESSION_STREAM_DECODE
        flags = 0
    }

    let dstSize = workingBufferSize
    let dstPointer = UnsafeMutablePointer<UInt8>.allocate(capacity: dstSize) // you could technically use a Data for this too by accessing the mutable bytes ;)
    defer {
        dstPointer.deallocate(capacity: dstSize)
    }
    
    let inputCount = input.count
    return input.withUnsafeBytes { (bytes: UnsafePointer<UInt8>) in // the closure here can return a Data so the entire compression operation can be encapsulated in the lambda
        var output = Data()
        var stream = compression_stream(dst_ptr: dstPointer, dst_size: dstSize, src_ptr: bytes, src_size: inputCount, state: nil)
        var status = compression_stream_init(&stream, streamOperation, streamAlgorithm)
        guard status != COMPRESSION_STATUS_ERROR else {
            return nil // perhaps instead of making this nullable a throw?
        }
        defer {
            compression_stream_destroy(&stream)
        }
        
        while status == COMPRESSION_STATUS_OK {
            // process the stream
            status = compression_stream_process(&stream, flags)
            
            // collect bytes from the stream and reset
            switch status {
            case COMPRESSION_STATUS_OK:
                output.append(dstPointer, count: dstSize)
                stream.dst_ptr = dstPointer
                stream.dst_size = dstSize
                break // I am certain you probably meant to break here?
            case COMPRESSION_STATUS_ERROR:
                return nil // perhaps instead of making this nullable a throw?
            case COMPRESSION_STATUS_END:
                output.append(dstPointer, count: stream.dst_ptr - dstPointer)
                break // here too, this should break right?
            default:
                break // this probably should be a fatal error since it is an unexpected result.
            }
        }
        return output
    }
}

// Compressed keeps the compressed data and the algorithm
// together as one unit, so you never forget how the data was
// compressed.
public struct Compressed {
    public let data: Data // this was what was saving your state of the pointer
    public let algorithm: CompressionAlgorithm
    
    public init(data: Data, algorithm: CompressionAlgorithm) {
        self.data = data
        self.algorithm = algorithm
    }
    
    // Compress the input with the specified algorith. Returns nil if it fails.
    public static func compress(input: Data,
                                with algorithm: CompressionAlgorithm) -> Compressed? {
        guard let data = perform(.compression, on: input, using: algorithm) else {
            return nil
        }
        return Compressed(data: data, algorithm: algorithm)
    }
    
    // Factory method to return uncompressed data. Returns nil if it cannot be decompressed.
    public func makeDecompressed() -> Data? {
        return perform(.decompression, on: data, using: algorithm)
    }
}

// For discoverability, add a compressed method to Data
extension Data {
    // Factory method to make compressed data or nil if it fails.
    public func makeCompressed(with algorithm: CompressionAlgorithm) -> Compressed? {
        return Compressed.compress(input: self, with: algorithm)
    }
}

// Example usage:
let input = Data(bytes: Array(repeating: UInt8(123), count: 10000))

let compressed = input.makeCompressed(with: .lzfse)
compressed?.data.count // in most cases much less than orginal input count
let restoredInput = compressed?.makeDecompressed()
input == restoredInput // true

On Dec 9, 2016, at 9:04 AM, Ray Fix <rayfix@gmail.com <mailto:rayfix@gmail.com>> wrote:

Hi Philippe,

Thank you for your response!

On Dec 9, 2016, at 8:44 AM, Philippe Hausler <phausler@apple.com <mailto:phausler@apple.com>> wrote:

So letting the bytes escape from the closure could potentially be in that territory of “bad things may happen”. If you know for certain that the Data will not be mutated and it’s underlying storage will last for the duration of the usage then you should be able to get away with it.

So far it is looking like I am getting away with it. I believe the conditions that you state hold.

The bytes pointer is an inner pointer return from the encapsulating object backing the struct Data. Since that backing object can potentially be deallocated via mutation or the buffer may be reallocated (which very often can cause the base address of the pointer to change) we decided it would be safer to use a closure to ensure the lifespan of the bytes over the access to the bytes.

Perhaps it might be best to understand more of what you are trying to do to offer a better solution.

I am trying to use the compression_stream C API in the Compression framework and wrap it so it is a bit more Swifty. :]

A full gist of the playground is here: Compression.swift · GitHub

See line 91.

Appreciate the hints.

Ray