Parsing binary data


(Tino) #1

Hi there!

Some weeks ago I wrote a parser for a binary format, but it's performance was disastrous, and I knew how to easily outperform this first approach with Objective-C by large.
Now, I'm about to write a different parser, which of course ;-), I'd prefer to code in Swift.
Working with raw bytes most likely won't ever be a thing where Swift shines, but I guess there are ways to do so without compromising speed… so, are there any established best-practices and things to avoid? Or is it less hassle to go back to C for low-level stuff?

Tino


(Dmitri Gribenko) #2

It should be simple: use Array for your data. Avoid creating
intermediate arrays that hold copies of parts of your data when
possible. It is OK to slice the array and create ArraySlices. This
should give you speed close to C.

If you are not satisfied with performance, feel free to post your code
here with some commentary, and someone might look at it and see if
there is any performance advice we could give you.

Dmitri

···

On Fri, Jul 8, 2016 at 12:15 AM, Tino Heth via swift-users <swift-users@swift.org> wrote:

Hi there!

Some weeks ago I wrote a parser for a binary format, but it's performance was disastrous, and I knew how to easily outperform this first approach with Objective-C by large.
Now, I'm about to write a different parser, which of course ;-), I'd prefer to code in Swift.
Working with raw bytes most likely won't ever be a thing where Swift shines, but I guess there are ways to do so without compromising speed… so, are there any established best-practices and things to avoid? Or is it less hassle to go back to C for low-level stuff?

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Tino) #3

It should be simple: use Array for your data.

Doing data-processing in big blocks often improves performance, and when I checked the effect of a very small buffer, it was quite significant.
But arrays are no option for me, because my data isn't structured that way:
It's a stream of different elements (timestamp, type, and a payload that depends on the type), so I have evaluate byte-by-byte…

But it seems I'm lucky, because the next format fits well into arrays :-).
I guess I'll give it a try, and I'll share my insights here if they are worth it (most likely, I won't write a comparison implementation in C if Swift is fast enough…)

Thanks,
Tino


(Jens Alfke) #4

This is the sort of thing a good stream API should be able to handle efficiently, i.e. methods to read/write various binary data types like integers. Or alternatively, an API to encode/decode those types into an ArraySlice (like Go’s encoding/binary package.)

Hopefully these can be added to the Swift standard library. In the interim, something like that Go package could be implemented with some small C functions callable from Swift.

—Jens

···

On Jul 8, 2016, at 2:12 AM, Tino Heth via swift-users <swift-users@swift.org> wrote:

But arrays are no option for me, because my data isn't structured that way:
It's a stream of different elements (timestamp, type, and a payload that depends on the type), so I have evaluate byte-by-byte…