Effective Java之考虑自定义的序列化模式(七十五)

程序员文章站 2022-05-23 18:05:20

...

为什么自定义序列化？

这里直接举一个书上的例子

public final class StringList implements Serializable {
    private int size = 0;
    private Entry head = null;

    private static class Entry implements Serializable {
        String data;
        Entry next;
        Entry previous;
    }
    ....
}

我们知道，序列一个类，会同时序列化它的组件。
也就是说，如果我序列化了“B”对象， B是双向链表，它要序列化它的内部成员“A”和“C”对象，但是序列化“A”和“C”对象的时候，B同时也是它们的组件，也要序列化“B”～～～～～～～～
于是就进入了无穷的死循环中！！！

这时候，我们的需求很简单，对于每个对象的Entry，我只序列化一次就行了，不需要迭代序列化。
于是就有了transient关键字

public final class StringList implements Serializable {

    private static final long serialVersionUID = 1L;
    private transient int size = 0; 
    //不会被序列化
    private transient Entry head = null;

    private static class Entry {
        String data;
        Entry next;
        Entry previous;
    }

    public final void add(String s) { ... }

    /**
     * Serialize this {@code StringList} instance
     * 
     * @serialData The size of the list (the number of strings it contains)
     * is emitted ({@code int}), followed by all of its elements (each a 
     * {@code String}), in the proper sequence.
     */
    private void writeObject(ObjectOutputStream s) throws IOException {
        s.defaultWriteObject();
        s.writeInt(size);
        for(Entry e = head; e != null; e = e.next ) {
            s.writeObject(e.data);
        }
    }

    private void readObject(ObjectInputStream s) 
        throws IOException, ClassNotFoundException {
        s.defaultReadObject();
        int num = s.readInt();
        for(int i=0; i < num; i++) {
            add((String)s.readObject());
        }
    }
    .....
}

标记为transient的不会自动序列化，这就防止默认序列化做出错误的事情，然后调用writeObject手动序列化transient字段，做自己认为正确的事。readObject同理。

注意⚠️：尽管StringList的所有域都是transient，但writeObject和readObject的首要任务仍是调用defaultXxxObject方法来序列化不带有transient的字段，这样可以极大的增强灵活性。因为万一以后加入了不带有transient的字段呢？

总结，自定义序列化目的就是做自己认为正确的事情，经典的例子有ArrayList和HashMap。

自定义序列化的例子

ArrayList

来看一下关键的实例域：

  private static final Object[] EMPTY_ELEMENTDATA = {};

    /**
     * Shared empty array instance used for default sized empty instances. We
     * distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when
     * first element is added.
     */
    private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};

    /**
     * The array buffer into which the elements of the ArrayList are stored.
     * The capacity of the ArrayList is the length of this array buffer. Any
     * empty ArrayList with elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA
     * will be expanded to DEFAULT_CAPACITY when the first element is added.
     */
    transient Object[] elementData; // non-private to simplify nested class access

    /**
     * The size of the ArrayList (the number of elements it contains).
     *
     * @serial
     */
    private int size;

这里定义空数组是为什么呢？
答案就在这里：Effective Java之返回零长度的数组或者集合，而不是null(四十三)

然后为什么transient Object[] elementData;
难道不应该初始化它的所有数组元素吗？
原因就是ArrayList实际上是动态数组，每次在放满以后自动增长设定的长度值，如果数组自动增长长度设为100，而实际elementData里面有可能有1个实际对象，其他都是null，默认的初始化会初始化99个null对象，这显然不合适！
所以，我们来欣赏一下它的writeObject方法：

    private void writeObject(java.io.ObjectOutputStream s)
        throws java.io.IOException{
        // Write out element count, and any hidden stuff
        int expectedModCount = modCount;
        s.defaultWriteObject();

        // Write out size as capacity for behavioural compatibility with clone()
        s.writeInt(size);

        // Write out all elements in the proper order.
        for (int i=0; i<size; i++) {
            s.writeObject(elementData[i]);
        }
        //prevent concurrent modification
        if (modCount != expectedModCount) {
            throw new ConcurrentModificationException();
        }
    }

HashMap

   transient Node<K,V>[] table;

原因1

因为读写Map是根据Object.hashcode()来确定从table[i]读/写，而Object.hashcode()是native方法, 不同的JVM里可能是不一样的。

比如向HashMap存一个键值对entry, key为字符串"hello", 在第一个java程序里, "hello"的hashcode()为1, 存入table【1】在另一个JVM程序里, "hello" 的hashcode()有可能就是2, 存入table【2】

所以不管物理结构，序列化只负责把key，value送货上门，具体放在哪个table，序列化不需要理会。

原因2:

table 和 ArrayList的elementData  中存储的值数量是小于数组的大小的（数组扩容的原因），这个在元素越来越多的情况下更为明显。如果使用默认的序列化，那些没有元素的位置也会被存储，就会产生很多不必要的浪费。