Simple java serialization and versioning

Here are a set of rules I find useful to follow when serializing java classes that may be persisted and later require versioning.

I enforce all the following rules in test code that uses reflection to walk recursively through all members of serialized objects, and checks for the required members and modifiers.

Define serialVersionUID with a value of 1. Increment to a new value only when you have permanently broken the backward compatibility of previously persisted versions. Our goal is never to break compatibility.
private static final long serialVersionUID = 1L;

Implement writeObject and readObject for every class in your hierarchy, with these exact signatures:

  private void writeObject(java.io.ObjectOutputStream out) 
    throws java.io.IOException

  private void readObject(java.io.ObjectInputStream in)
    throws java.io.IOException, ClassNotFoundException

Declare all member variables transient to ensure that you explicitly serialize everything.
Do not call ObjectOutputStream#defaultWriteObject or ObjectInputStream#defaultReadObject from writeObject or readObject. We will not use any default serialization.
Serialize a version as an integer to mark a change in what is serialized, but still supported. (For example, an int is replaced by a long. A new member can be given a good default value.)
Optionally, serialize a release as an integer shared by other objects that are serialized together. This can be useful for adjusting to global structural changes. Use a version for changes withing a single class, and a release number for changes that involve multiple classes. Maintain the release integer externally so that it can be shared. Serializable objects might depend on each other and need to change as a group. It is also possible your serialization depends on a third party. I try very hard to avoid this situation. I might use a change in release just to log a warning.
Put all serialized member variables into a Map with keys identical to the names of the member variables, and serialize the Map as a single object. This makes it much easier to detect the addition, removal, and modification of members without remembering the order they were previously written.

Here is a sample implementation:

public class Foo implements java.io.Serializable {
  private static final long serialVersionUID = 1L; // try never to change
  private static final int VERSION = 2;

  private transient double[] data = {3.14159, 2.71828};

  // ...

  private void writeObject(java.io.ObjectOutputStream out)
    throws java.io.IOException {
    java.util.Map<String, Object> map = new java.util.HashMap<String, Object>();
    map.put("data", data);
    map.put("VERSION", VERSION);
    out.writeObject(map);
  }

  private void readObject(java.io.ObjectInputStream in)
    throws java.io.IOException, ClassNotFoundException {

    @SuppressWarnings("unchecked") java.util.Map<String, Object> map = 
      (java.util.Map<String, Object>) in.readObject();

    int version = (Integer) map.get("VERSION");
    if (version == VERSION) {
        data = (double[]) map.get("data");
    if (version > VERSION) {
      throw new IOException("Cannot deserialize data from version "+version+" of this code, "+
                            "which is newer than current version "+VERSION);
    } else { // convert older version, previously serialized as floats
        float[] olderData = (float[]) map.get("data");
        data = new double[olderData.length];
        for (int i=0; i<data.length; ++i) {data[i] = olderData[i];}
    } 
  }
}

Do not serialize unnamed inner classes that compile to Foo$1, etc. These compiled names are not under your control.

There are a few special cases.

You cannot follow these rules if you extend a non-serializable class with useful state. That would force you to use default serialization, which makes it much more difficult to manage versioning. Avoid extending such classes, and wrap them as members instead. Reconstruct them from the data originally provided to their constructors.
You may be tempted to use ObjectOutputStream.PutField and ObjectInputStream.GetField instead of a Map. Unfortunately, you can only add actual non-transient fields to the PutFields buffer. You lose the ability to convert or simplify those fields for backward-compatibility. You cannot pass along a version number that is not a field. GetField also cannot list the available fields. These buffers do not even add much type safety because the get and set methods are overloaded by type in an unusual, error-prone way.
I would not bother to use a Map for very small objects. If you look at the implementation of HashMap#writeObject you will see that it does not add much overhead for a moderately sized object. It serializes the keys and their values and a few additional integers.
These rules are not necessary for Enums, which enforce their own rules. The release notes say the following: "The rules for serializing an enum instance differ from those for serializing an 'ordinary' serializable object: the serialized form of an enum instance consists only of its enum constant name, along with information identifying its base enum type. Deserialization behavior differs as well--the class information is used to find the appropriate enum class, and the Enum.valueOf method is called with that class and the received constant name in order to obtain the enum constant to return."

Bill Harlan, December 2009

Return to parent directory.