I like the idea of Foundation's Data type a lot (the equivalent of a
RawPointer for managed data storage), but every time I try to use it, it ends up making everything really slow.
Benchmark used is available here. Make sure to compile it with
-O. Most of the printouts are to make sure the optimizer doesn't remove things, they can be generally ignored and were removed from the output posted here.
First, simply allocating a
Data takes 2.5 times as long as it does for array
Here's a comparison of the time it takes to allocate 220 collections (
Data) holding a 32-byte payload:
> time ./DataSpeedTest alloc array 20 0.23 real 0.19 user 0.03 sys > time ./DataSpeedTest alloc data 20 0.62 real 0.54 user 0.07 sys
If you look at the memory allocations made using Instruments, you'll see that one
[UInt8] storing 32 bytes allocates 8 bytes on the stack and 64 bytes on the heap, while a
Data allocates 24 bytes on the stack, and both a 96 byte
Foundation._DataStorage and a 48 byte payload.
Personally, I actually find this the most reasonable of the issues, since
Data does support custom deallocators and such, but it does mean that if you want a container to store an object that's a few bytes of raw data,
Data is not the container for you.
The rest of the tests are performed on a collection storage holding a 226 byte (64MB) payload of repeated ASCII
'a's. If you want to run with a different size, supply the log2 of the count you want to use as the final argument to the program.
Simply looping over a
data is much slower than an array (test does a for loop that counts and sums the collection's contents)
> time ./DataSpeedTest for array 0.08 real 0.05 user 0.02 sys > time ./DataSpeedTest for data 0.51 real 0.48 user 0.03 sys
This carries over to generic functions like
> time ./DataSpeedTest reduce array 0.08 real 0.05 user 0.03 sys > time ./DataSpeedTest reduce data 0.48 real 0.44 user 0.03 sys
> time ./DataSpeedTest string array 0.26 real 0.20 user 0.05 sys > time ./DataSpeedTest string data 1.05 real 0.99 user 0.05 sys
Finally, the worst issue I found was with slicing
Data, which for some reason causes a memory allocation. The test code loops over the collection by repeatedly shrinking a slice and reading from the beginning, something I've found useful for algorithms that want to take variably-sized pieces off of a collection.
> time ./DataSpeedTest slice array 0.10 real 0.06 user 0.03 sys > time ./DataSpeedTest slice data 10.62 real 10.56 user 0.04 sys
Am I using
Data wrong or misunderstanding what it's meant for? Is it only supposed to be used for large allocations that you access using