How to handle endianess?

Karl · November 18, 2021, 12:02pm

Endianness determines how a bunch of bytes are interpreted as a number, and applies to both integers and floating-point values. So if you're going from a numeric value -> bytes, or bytes -> numeric value, you should ensure a consistent endianness.

Note that even if you write an integer using a hex or binary literal, such as 0xDEADBEEF, that is a numeric literal and not a bytes-in-memory literal. Printing and parsing also works at the numeric level, etc.

Yes, and the operating system. Bi-endian machines will sometimes have big- and little-endian variants of the OS.

Testing is a big problem, though. If you look at Debian's list of ports, you'll notice that even for bi-endian machines, only the little-endian variants are supported or actively maintained (e.g. MIPS, MIPS64, PPC64, RISCV), and the big-endian variants are all discontinued. The notable holdout is IBM's s390x, used in mainframes that most people don't have access to.

You can basically ignore big-endian machines. Writing serialised data with a defined endianness is nice, and about as much effort as you need to put in to it; don't bother to actually write the code to handle running on a big-endian machine.

I had a similar issue with a library I was working on, and I decided it wasn't even worth worrying about. In the very unlikely case that somebody releases a new big-endian machine and wants to port your database to it, they can figure out the issues at that time. They'll already be busy porting half the universe to their system, and the chances are your untestable code won't work first time anyway.

Personally, if portability is the goal, I would rather invest my time supporting other OSes such as Windows, not obscure hardware platforms. That's a system people actually use and you can actually test.

Hope that helps!