[36.8] How do I serialize objects that are part of an inheritance hierarchy and that don't contain pointers to other objects?
Suppose you want to serialize a "shape" object, where Shape is an
abstract class with derived classes Rectangle, Ellipse,
Line, Text, etc. You would declare a
pure virtual function
serialize(std::ostream&) const within class Shape, and make
sure the first thing done by each override is to write out the class's
identity. For example, Ellipse::serialize(std::ostream&) const would
write out the identifier Ellipse (perhaps
as a simple string, but there are several
alternatives discussed below).
Things get a little trickier when unserializing the object. You typically
start with a static member function in the base class such as
Shape::unserialize(std::istream& istr). This is declared to return a
Shape* or perhaps a smart pointer such as
Shape::Ptr. It reads the
class-name identifier, then uses some sort of creational pattern to
create the object. For example, you might have a table that maps from the
class name to an object of the class, then use the Virtual
Constructor Idiom to create the object.
Here's a concrete example: Add a pure virtual
method create(std::istream&) const within base class Shape,
and define each override to a one-liner that uses new to allocate an
object of the appropriate derived class. E.g.,
Ellipse::create(std::istream& istr) const would be { return new
Ellipse(istr); }. Add a static
std::map<std::string,Shape*> object that maps from the class name to a
representative (AKA prototype) object of the appropriate class; e.g.,
"Ellipse" would map to a new Ellipse(). Function
Shape::unserialize(std::istream& istr) would
read the class-name, throw an exception if
it's not in the map (if (theMap.count(className) == 0) throw
...something...), then look up the associated Shape* and call
its create() method: return theMap[className]->create(istr).
The map is typically populated during static initialization. For example, if
file Ellipse.cpp contains the code for derived class Ellipse,
it would also contain a static object whose ctor adds that class to
the map: theMap["Ellipse"] = new Ellipse().
Notes and caveats:
- It adds a little flexibility if Shape::unserialize() passes the
class name to the create() method. In particular, that would let a
derived class be used with two or more names, each with its own "network
format." For example, derived class Ellipse could be used for both
"Ellipse" and "Circle", which might be useful to save space in
the output stream or perhaps other reasons.
- It's usually easiest to handle errors during unserialization by throwing
an exception. You can return NULL if you want, but you will need to
move the code that reads the input stream out of the derived class' ctors into
the corresponding create() methods, and ultimately the result is often
that your code is more complicated.
- You must be careful to avoid the static
initialization order fiasco with the map used by
Shape::unserialize(). This normally means using the
Construct On First Use Idiom for
the map itself.
- For the map used by Shape::unserialize(), I personally prefer the
Named Constructor Idiom over the
Virtual Constructor Idiom — it simplifies a few
steps. Details: I usually define a typedef within Shape such
as typedef Shape* (*Factory)(std::istream&). This means
Shape::Factory is a "pointer to a function that takes a
std::istream& and returns a Shape*." I then define the map as
std::map<std::string,Factory>. Finally I populate that map using
lines like theMap["Ellipse"] = Ellipse::create (where
Ellipse::create(std::istream&) is now a
static member function of class
Ellipse, that is, the Named Constructor
Idiom). You'd change the return value in function
Shape::unserialize(std::istream& istr) from
theMap[className]->create(istr) to
theMap[className](istr).
- If you might need to serialize a NULL pointer, it's usually easy
since you already write out a class identifier so you can just as easily write
out a pseudo class identifier like "NULL". You might need an
extra if statement in Shape::unserialize(), but if you chose
my preference from the previous bullet, you can eliminate that special case
(and generally keep your code clean) by defining static member
function Shape* Shape::nullFactory(istream&) { return NULL; }.
You add that function to the map as any other: theMap["NULL"] =
Shape::nullFactory;.
- You can make the serialized form smaller and a little faster if you
tokenize the class name identifiers. For example, write a class name only the
first time it is seen, and for subsequent uses write only a corresponding
integer index. A mapping such as std::map<std::string,unsigned>
unique makes this easy: if a class name is already in the map, write
unique[className]; otherwise set a variable unsigned n =
unique.size(), write n, write the class name, and set
unique[className] = n. (Note: be sure to copy it into a
separate variable. Do not say unique[className] =
unique.size()! You have been warned! Reason: the compiler might evaluate
unique[className] before unique.size(), and if so,
unique[className] will pre-increment the size.) When unserializing,
use std::vector<std::string> unique, read the number n, and if
n == unique.size(), read a name and add it to
the vector. Either way the name will be unique[n]. You can
also pre-populate the first N slots in these tables with the N
most common names, that way streams won't need to contain any of those
strings.