C++ FAQ Celebrating Twenty-One Years of the C++ FAQ!!!
(Click here for a personal note from Marshall Cline.)
Section 36:
[36.3] How do I decide whether to serialize to human-readable ("text") or non-human-readable ("binary") format?


There is no "right" answer to this question; it really depends on your goals. Here are a few of the pros/cons of human-readable ("text") format vs. non-human-readable ("binary") format:

  • Text format is easier to "desk check." That means you won't have to write extra tools to debug the input and output; you can open the serialized output with a text editor to see if it looks right.
  • Binary format typically uses fewer CPU cycles. However that is relevant only if your application is CPU bound and you intend to do serialization and/or unserialization on an inner loop/bottleneck. Remember: 90% of the CPU time is spent in 10% of the code, which means there won't be any practical performance benefit unless your "CPU meter" is pegged at 100%, and your serialization and/or unserialization code is consuming a healthy portion of that 100%.
  • Text format lets you ignore programming issues like sizeof and little-endian vs. big-endian.
  • Binary format lets you ignore separations between adjacent values, since many values have fixed lengths.
  • Text format can produce smaller results when most numbers are small and when you need to textually encode binary results, e.g., uuencode or Base64.
  • Binary format can produce smaller results when most numbers are large or when you don't need to textually encode binary results.

You might think of others to add as well... The important thing to remember is that one size does not fit all — make a careful decision here.

One more thing: no matter which you choose, you might want to start each file / stream with a "magic" tag and a version number. The version number would indicate the format rules. That way if you decide to make a radical change in the format, you hopefully will still be able to read the output produced by the old software.