Set size of byte array


(KS Sreeram) #1

Hello

I’m trying to initialize a byte-array efficiently with minimal copying with the following steps:

1. Create an empty byte array.
2. Reserve sufficient capacity in the array.
3. Use mutable pointers into the array to fill in the data.
4. The actual size that was filled is known only after it is filled in.
5. I would like to set the size of the array to the actual size.

I couldn’t find any methods for doing the last step. Is there a way to do this?

Thanks in advance!

KS Sreeram


(Brent Royal-Gordon) #2

I don't believe this is possible with an Array; the only way to work with uninitialized memory is with an UnsafeMutablePointer.

The simplest solution is probably to load the data into an allocated chunk of memory with an UnsafeMutablePointer and then create an NSData object to keep track of its lifetime. The `init(bytesNoCopy:length:deallocator:)` initializer can be used to make sure it deallocates the memory correctly.

If you don't *need* to use mutable pointers to fill the array, however, a custom SequenceType might be a better option. For instance, you could write a type like this which returns the data a byte at a time:

  struct MyDataSource: SequenceType, GeneratorType {
    ...
    
    init(...) {
      ...
    }
    
    var initialCapacity: Int {
      ...guess at the capacity...
    }
    
    mutating func next() -> UInt8? {
      ...determine and return the next byte, or nil if you've reached the end...
    }
  }

Then you can write something like this:

  var source = MyDataSource(...)
  var array: [UInt8] = []
  array.reserveCapacity(source.initialCapacity)
  array.appendContentsOf(source)

And, et voila, `array` is full of your bytes. From what I can tell, if the capacity is already there, the standard library effectively uses a loop as tight as any you could hope for:

    let base = buffer.firstElementAddress

    while (nextItem != nil) && count < capacity {
      (base + count).initialize(to: nextItem!)
      count += 1
      nextItem = stream.next()
    }
    buffer.count = count

So I suspect a custom SequenceType will perform much better than you would first guess.

Hope this helps,

···

On Aug 4, 2016, at 11:42 PM, KS Sreeram via swift-users <swift-users@swift.org> wrote:

I’m trying to initialize a byte-array efficiently with minimal copying with the following steps:

1. Create an empty byte array.
2. Reserve sufficient capacity in the array.
3. Use mutable pointers into the array to fill in the data.
4. The actual size that was filled is known only after it is filled in.
5. I would like to set the size of the array to the actual size.

I couldn’t find any methods for doing the last step. Is there a way to do this?

--
Brent Royal-Gordon
Architechies


(KS Sreeram) #3

I’m by no means a Swift expert (not even a user at the moment), but I don't
see a way to do this either. The best that comes to mind is initializing a
ContiguousArray with the sufficient capacity as size, using
withUnsafeBufferPointer to overwrite elements, and then use replaceAll() with
an empty collection to remove the excess size.

The only initializer that could help with this is: init(count:repeatedValue:). Unfortunately, this means the buffer is written to twice - once in the initializer, and a second time to actually fill in the data. It’s not efficient.

I'd think that just reserving enough capacity and then appending the elements
one-by-one will be (at least?) as efficient: The byte is always copied anyway,
and in this case you wouldn’t have to default-construct the capacity beforehand.

This too isn’t efficient because each append call will incur an the cost of checking for sufficient capacity and also incrementing the size. The additional penalty could dwarf the cost of actually copying a byte! I don’t think the compiler is capable of removing this inefficiency.

Cheers
KS Sreeram

···

On 05-Aug-2016, at 1:19 PM, Daniel Vollmer <lists@maven.de> wrote:


(KS Sreeram) #4

I don't believe this is possible with an Array; the only way to work with uninitialized memory is with an UnsafeMutablePointer.

The simplest solution is probably to load the data into an allocated chunk of memory with an UnsafeMutablePointer and then create an NSData object to keep track of its lifetime. The `init(bytesNoCopy:length:deallocator:)` initializer can be used to make sure it deallocates the memory correctly.

Thanks for the tip. This does mean I have to give up the niceties of arrays post initialization.

If you don't *need* to use mutable pointers to fill the array, however, a custom SequenceType might be a better option. For instance, you could write a type like this which returns the data a byte at a time:

  struct MyDataSource: SequenceType, GeneratorType {

Then you can write something like this:

  var source = MyDataSource(...)
  var array: [UInt8] = []
  array.reserveCapacity(source.initialCapacity)
  array.appendContentsOf(source)

And, et voila, `array` is full of your bytes. From what I can tell, if the capacity is already there, the standard library effectively uses a loop as tight as any you could hope for:

   let base = buffer.firstElementAddress

   while (nextItem != nil) && count < capacity {
     (base + count).initialize(to: nextItem!)
     count += 1
     nextItem = stream.next()
   }
   buffer.count = count

So I suspect a custom SequenceType will perform much better than you would first guess.

The code I’m working on essentially does serialization. It’s not really amenable to the inversion of control required for generators.

Best
KS Sreeram

···

On 05-Aug-2016, at 4:02 PM, Brent Royal-Gordon <brent@architechies.com> wrote:


(Dave Abrahams) #5

Create a Sequence that represents the elements you're going to append,
e.g.:

  var a = [1, 2]
  a.reserve(256)
  a += sequence(first: 3, next: {$0 < 1000 ? ($0 + 3) * 2 : nil})

There are lots of ways to make Sequences, but the overloaded sequence()
functions are probably the easiest.

HTH,

···

on Fri Aug 05 2016, KS Sreeram <swift-users-AT-swift.org> wrote:

On 05-Aug-2016, at 1:19 PM, Daniel Vollmer <lists@maven.de> wrote:

I’m by no means a Swift expert (not even a user at the moment), but I don't
see a way to do this either. The best that comes to mind is initializing a
ContiguousArray with the sufficient capacity as size, using
withUnsafeBufferPointer to overwrite elements, and then use replaceAll() with
an empty collection to remove the excess size.

The only initializer that could help with this is:
init(count:repeatedValue:). Unfortunately, this means the buffer is
written to twice - once in the initializer, and a second time to
actually fill in the data. It’s not efficient.

I'd think that just reserving enough capacity and then appending the elements
one-by-one will be (at least?) as efficient: The byte is always copied anyway,
and in this case you wouldn’t have to default-construct the capacity beforehand.

This too isn’t efficient because each append call will incur an the
cost of checking for sufficient capacity and also incrementing the
size. The additional penalty could dwarf the cost of actually copying
a byte! I don’t think the compiler is capable of removing this
inefficiency.

--
-Dave


(Joe Groff) #6

cc-ing Arnold, who owns this part of the optimizer. IIRC we are able to hoist the capacity check in 'append' loops in some cases.

-Joe

···

On Aug 5, 2016, at 9:42 AM, KS Sreeram via swift-users <swift-users@swift.org> wrote:

On 05-Aug-2016, at 1:19 PM, Daniel Vollmer <lists@maven.de> wrote:

I’m by no means a Swift expert (not even a user at the moment), but I don't
see a way to do this either. The best that comes to mind is initializing a
ContiguousArray with the sufficient capacity as size, using
withUnsafeBufferPointer to overwrite elements, and then use replaceAll() with
an empty collection to remove the excess size.

The only initializer that could help with this is: init(count:repeatedValue:). Unfortunately, this means the buffer is written to twice - once in the initializer, and a second time to actually fill in the data. It’s not efficient.

I'd think that just reserving enough capacity and then appending the elements
one-by-one will be (at least?) as efficient: The byte is always copied anyway,
and in this case you wouldn’t have to default-construct the capacity beforehand.

This too isn’t efficient because each append call will incur an the cost of checking for sufficient capacity and also incrementing the size. The additional penalty could dwarf the cost of actually copying a byte! I don’t think the compiler is capable of removing this inefficiency.


(Arnold Schwaighofer) #7

The optimizer is not currently doing an optimization based on semantic knowledge of reserveCapacity(), the number of appends called, and capacity() calls.

Without that the optimizer just sees a load of the capacity field which is (conditionally) written to by the append call and therefore not loop invariant.

···

On Aug 9, 2016, at 12:07 PM, Joe Groff <jgroff@apple.com> wrote:

On Aug 5, 2016, at 9:42 AM, KS Sreeram via swift-users <swift-users@swift.org> wrote:

On 05-Aug-2016, at 1:19 PM, Daniel Vollmer <lists@maven.de> wrote:

I’m by no means a Swift expert (not even a user at the moment), but I don't
see a way to do this either. The best that comes to mind is initializing a
ContiguousArray with the sufficient capacity as size, using
withUnsafeBufferPointer to overwrite elements, and then use replaceAll() with
an empty collection to remove the excess size.

The only initializer that could help with this is: init(count:repeatedValue:). Unfortunately, this means the buffer is written to twice - once in the initializer, and a second time to actually fill in the data. It’s not efficient.

I'd think that just reserving enough capacity and then appending the elements
one-by-one will be (at least?) as efficient: The byte is always copied anyway,
and in this case you wouldn’t have to default-construct the capacity beforehand.

This too isn’t efficient because each append call will incur an the cost of checking for sufficient capacity and also incrementing the size. The additional penalty could dwarf the cost of actually copying a byte! I don’t think the compiler is capable of removing this inefficiency.

cc-ing Arnold, who owns this part of the optimizer. IIRC we are able to hoist the capacity check in 'append' loops in some cases.