I was evaluating boost::serialization today. Based on the design goals mentioned in the library’s introduction, I felt like boost::serialization would suit my needs.
An interesting point is this :
8. Orthogonal specification of class serialization and archive format. That is, any file format should be able to store serialization of any arbitrary set of C++ data structures without having to alter the serialization of any class.
At first, I interpreted it as an intent to decouple the serialization code (that knows about the object’s internals) and the archive format.
Consider this:
1 2 3 4 5 6 7 8 9 10 11 |
struct ClassA { ... template <typename Archive> void serialize(Archive ar, unsigned int version) { ar & member1_; ar & member2_; ... ar & membern_; } ... } |
All is fine, ClassA does not know the specifics of type Archive. Any object implementing the Archive concept will do just fine.
Yet, I have a problem with this. Inline code in headers has a tendency to irritate me. It hides the structure of the classes you implement so I often use the PImpl idiom.
I want to (or must in the case of PImpl) move the implementation to ClassA.cpp.
But … can I ?
serialize() is a template method. Can I forward declare a template method ? Well, yes.
In ClassA.hpp :
1 2 3 4 5 6 |
struct ClassA { ... template <typename Archive> void serialize(Archive & ar, unsigned int version); ... } |
In ClassA.cpp:
1 2 3 4 5 6 7 |
template <typename Archive> void serialize(Archive & ar, unsigned int version) { ar & member1_; ar & member2_; ... ar & membern_; } |
In main.cpp, try to serialize an instance of ClassA :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
#include "ClassA.hpp" #include <boost/archive/text_oarchive.hpp> #include <boost/archive/text_iarchive.hpp> #include <fstream> int main(int argc, char ** argv) { ClassA a; std::ofstream ofs("output.txt"); boost::archive::text_oarchive oa(ofs); oa << a; return 0; } |
Compile and link :
1 2 3 4 5 | g++ -c main.cpp -o main.o g++ -c ClassA.cpp -o ClassA.o g++ main.o -lboost_serialization -o program main.o: In function <code>void boost::serialization::access::serialize<boost::archive::text_oarchive, classa="">(boost::archive::text_oarchive&, ClassA&, unsigned int)': main.cpp:(.text._ZN5boost13serialization6access9serializeINS_7archive13text_oarchiveE6ClassAEEvRT_RT0_j[_ZN5boost13serialization6access9serializeINS_7archive13text_oarchiveE6ClassAEEvRT_RT0_j]+0x25): undefined reference to |
void ClassA::serialize<boost::archive::text_oarchive>(boost::archive::text_oarchive&, unsigned int)’
collect2: error: ld returned 1 exit status
make: *** [program] Error 1
Does this compile ? Yes. Does this link ? … No!
What’s the compiler trying to tell me ? Undefined reference to ClassA::serialize(…). But I defined it, no ?
Well, yes and no. You wrote the code but it is a template function. So it needs to be instantiated at compile-time. When main.cpp is compiled, it sees the forward-declaration of ClassA::serialize() and assumes that the linker will find the implementation of void ClassA::serialize<boost::archive::text_oarchive>(boost::archive::text_oarchive&, unsigned int) somewhere.
But it does not because, ClassA.cpp while implementing the template function does not instantiate it. ClassA::serialize() is parsed but never compiled into actual machine code.
You can check for yourself. Look at the file size :
1 2 | -rw-rw-r-- 1 timoch timoch 940 Apr 12 12:59 ClassA.o -rw-rw-r-- 1 timoch timoch 143088 Apr 12 12:59 main.o |
940 bytes ? That’s not a lot.
You can force the instantiation of ClassA::serialize() in ClassA.cpp. Add the following at the end:
1 2 |
#include template void ClassA::serialize(boost::archive::text_oarchive &, unsigned int); |
It works, but is it good ? Not to me.
I need to include a header defining the implementation of the specific format I serialize to. I also need to include text_iarchive.hpp for the loading process to work. Tomorrow, when my object needs to be serialized to another format as part as another use case, I will need to modify its implementation file to include the specifics of that other format. I will need to do this for each and every class to be serialized … not something I would enjoy.
Conclusion
Templates provide a huge flexibility. Here it is used to enable the & operator to serve as both an extract and inject operator. However, it is at the expense of forcing the client application to put the saving/loading implementation in the same compile unit as the definitions of your target format. It completely voids the efforts put toward decoupling the serialized objects and the format they serialize to.
There are ways to achieve the same flexibility of ‘same operator’ saving and loading while preserving decoupling with the serialization format. I will come back to that in a later post.