C++ FAQ Celebrating Twenty-One Years of the C++ FAQ!!!
(Click here for a personal note from Marshall Cline.)
Section 36:
[36.6] How exactly do I read/write simple types in non-human-readable ("binary") format?

Before you read this, make sure to evaluate all the tradeoffs between human-readable and non-human-readable formats. The tradeoffs are non-trivial, so you should resist a knee-jerk reaction to do it the way you did it on the last project — one size does not fit all.

After you have made an eyes-open decision to use non-human-readable ("binary") format, you should remember these keys:

  • Make sure you open the input and output streams using std::ios::binary. Do this even if you are on a Unix system since it's easy to do, it documents your intent, and it's one less non-portability to locate and change down the road.
  • You probably want to use iostream's read() and write() methods instead of its >> and << operators. read() and write() are better for binary mode; >> and << are better for text mode.
  • If the binary data might get read by a different computer than the one that wrote it, be very careful about endian issues (little-endian vs. big-endian) and sizeof issues. The easiest way to handle this is to anoint one of those two formats as the official "network" format, and to create a header file that contains machine dependencies (I usually call it machine.h). That header should define inline functions like readNetworkInt(std::istream& istr) to read a "network int," and so forth for reading and writing all the primitive types. You can define the format for these pretty much anyway you want. E.g., you might define a "network int" as exactly 32 bits in little endian format. In any case, the functions in machine.h will do any necessary endian conversions, sizeof conversions, etc. You'll either end up with a different machine.h on each machine architecture, or you'll end up with a lot of #ifdefs in your machine.h, but either way, all this ugliness will be buried in a single header, and all the rest of your code will be clean(er). Note: the floating point differences are the most subtle and tricky to handle. It can be done, but you'll have to be careful with things like NaN, over- and under-flow, #bits in the mantissa or exponent, etc.
  • When space-cost is an issue, such as when you are storing the serialized form in a small memory device or sending it over a slow link, you can compress the stream and/or you can do some manual tricks. The simplest is to store small numbers in a smaller number of bytes. For example, to store an unsigned integer in a stream that has 8-bit bytes, you can hijack the 8th bit of each byte to indicate whether or not there is another byte. That means you get 7 meaningful bits/byte, so 0...127 fit in 1 byte, 128...16384 fit in 2 bytes, etc. If the average number is smaller than around half a billion, this will use less space than storing every four-byte unsigned number in four 8-bit bytes. There are lots of other variations on this theme, e.g., a sorted array of numbers can store the difference between each number, storing extremely small values in unary format, etc.
  • String data is tricky because you have to unambiguously know when the string's body stops. You can't unambiguously terminate all strings with a '\0' if some string might contain that character; recall that std::string can store '\0'. The easiest solution is to write the integer length just before the string data. Make sure the integer length is written in "network format" to avoid sizeof and endian problems (see the solutions in earlier bullets).

Please remember that these are primitives that you will need to use in the other FAQs in this section.