未完 Java Collections | 容器博客分类： Java Foundation java容器collection

程序员文章站 2024-03-09 13:28:01

...

Sources：
http://docs.oracle.com/javase/tutorial/collections/TOC.html

数据结构：
http://wuaner.iteye.com/blog/553007
为什么重写equals()必须重写hashcode()：
http://wuaner.iteye.com/blog/588299

Java 常用容器类：
Collection
          List - 此接口用来定义元素有顺序、可以重复的容器类
                    ArrayList - 算法实现：可变长数组（Resizable array）
                    LinkedList - 算法实现：链表（Linked List）
                    Vector - 算法实现：可变长数组（Resizable array）
                              Stack - 算法实现：可变长数组（Resizable array）
          Set - 此接口用来定义元素不可以重复(依据equals)的容器类
                    HashSet - 算法实现：散列表（Hash Table）。内部基于HashMap
                              LinkedHashSet - 算法实现：散列表（Hash Table） + 链表（Linked List）
                    SortedSet - 此接口用来定义元素不可以重复、有顺序的容器类
                              TreeSet - 算法实现：红黑树（Red-black Tree）。内部基于TreeMap
          Queue
                    LinkedList - 算法实现：链表（Linked List）
Map - key obj->value obj的映射关系接口，作为key的对象不能重复(依据equals；有一个例外，IdentityHashMap，详见下面)
          Hashtable - 算法实现：散列表（Hash Table）
                    Properties - 算法实现：散列表（Hash Table）
          HashMap - 算法实现：散列表（Hash Table）
                    LinkedHashMap - 算法实现：散列表（Hash Table） + 链表（Linked List）
          WeakHashMap - 算法实现：散列表（Hash Table）
          SortedMap
                    TreeMap - 算法实现：红黑树（Red-black Tree）
          ConcurrentMap
                              ConcurrentHashMap - 算法实现：散列表（Hash Table）

底层算法实现的不同导致了容器类各自适用的使用场景
基于数组的，优点自然是查找快速，缺点是插入和删除慢；
基于链表，优点自然是插入和删除快，查找慢；
基于二叉查找树的，
基于散列表的，优点是常数级的快速查找，插入和删除快，缺点是。

用途存在交集的相关容器类的区别，这就涉及到相关Java代码中的具体实现策略如是否synchrolized、null处理等：
Hashtable & HashMap：
Hashtable的很多操作方法都加了synchronized，使其是线程同步的、线程安全的，无可避免的存在线程并发访问时的性能问题；而HashMap没有一个synchronized，导致其是非线程安全的；
Hashtable不允许null作为key 或 value；HashMap允许(通过固定null key的index为0，和若存在null key则只替换原以null为key的Entry的value，来保证只会有一个key为null的Entry)
当size()超过阈值threshold时：Hashtable：newCapacity = oldCapacity * 2 + 1，加1是为了一定程度上保证新的capacity也是质数(因为Hashtable使用了除留取余法作为散列函数) ；HashMap：newCapacity = oldCapacity * 2，乘2是为了保证capacity永远是2的幂。

关于 Iterator 和 Iterable：
http://www2.hawaii.edu/~esb/2011spring.ics211/feb03.html

http://www.iteye.com/topic/308883
http://www.iteye.com/topic/22192

典型分析 - 散列表之 HashMap：

引用

对象的hashcode()方法返回的hash code对应散列表概念里的“关键字”(当然，你也可以将equels和hashcode方法所参照的对象状态字段理解为关键字，而将hashcode()理解为散列表概念中"散列函数"的一部分。)

HashMap使用的散列函数：

        int hash = hash(key.hashCode());
        int i = indexFor(hash, table.length);

分别看：hash(hash code)是为了使散列函数更加的均匀(Uniform)

    /**
     * Applies a supplemental hash function to a given hashCode, which
     * defends against poor quality hash functions.  This is critical
     * because HashMap uses power-of-two length hash tables, that
     * otherwise encounter collisions for hashCodes that do not differ
     * in lower bits. Note: Null keys always map to hash 0, thus index 0.
     */
    static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

避免混乱，下面统称hash(hash code)的返回值为rehash code
另外:buckets array = hash buckets,指的就是HashMap的实例变量table (transient Entry[] table;)

因为HashMap的buckets array长度为2的幂，所以可以使用按位与的方式来算对象在hash buckets中的address/index。相较除留取余法(取余操作是昂贵的)，这是一种更加高效的方式：

    /**
     * Returns index for hash code h.
     */
    static int indexFor(int h, int length) {
        return h & (length-1);
    }

HashMap使用 Separate chaining(链地址法)解决冲突：

    static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        final int hash;
        ...
    }

    public V put(K key, V value) {
        ...
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

这里需要注意的是，为了性能需要，HashMap做了优化，在Entry中缓存了对象rehash code值，并在插入put、获取get等操作时使用该值作为判断重复的一部分。这样做的直接后果就是：如果你通过重写equals方法定义了自己认可的“逻辑对象相等”，就必须重写hashcode方法，以保证互相equels相等的对象有相同的hash code。

HashMap允许key为null，其对null key的处理是：将key为null的Entry永远放在hash buckets中index为0的位置；如果已存在key为null的Entry，则用新的value替换原Entry的value，从而保证了null key只会有一个

    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
    ...
    }

    /**
     * Offloaded version of put for null keys
     */
    private V putForNullKey(V value) {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        addEntry(0, null, value, 0);
        return null;
    }

capacity(即hash buckets的长度)乘以load factor(装载因子)为HashMap的threshold（阈值）。当size()超过这一阈值时，HashMap的Hash buckets会扩容为原来的两倍（扩容为原来两倍的原因很简单：保证hash buckets的长度永远是2的幂）：

    /**
     * Adds a new entry with the specified key, value and hash code to
     * the specified bucket.  It is the responsibility of this
     * method to resize the table if appropriate.
     *
     * Subclass overrides this to alter the behavior of put method.
     */
    void addEntry(int hash, K key, V value, int bucketIndex) {
	Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
        if (size++ >= threshold)
            resize(2 * table.length);
    }

    /**
     * Rehashes the contents of this map into a new array with a
     * larger capacity.  This method is called automatically when the
     * number of keys in this map reaches its threshold.
     *
     * If current capacity is MAXIMUM_CAPACITY, this method does not
     * resize the map, but sets threshold to Integer.MAX_VALUE.
     * This has the effect of preventing future calls.
     *
     * @param newCapacity the new capacity, MUST be a power of two;
     *        must be greater than current capacity unless current
     *        capacity is MAXIMUM_CAPACITY (in which case value
     *        is irrelevant).
     */
    void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];
        transfer(newTable);
        table = newTable;
        threshold = (int)(newCapacity * loadFactor);
    }

    /**
     * Transfers all entries from current table to newTable.
     */
    void transfer(Entry[] newTable) {
        Entry[] src = table;
        int newCapacity = newTable.length;
        for (int j = 0; j < src.length; j++) {
            Entry<K,V> e = src[j];
            if (e != null) {
                src[j] = null;
                do {
                    Entry<K,V> next = e.next;
                    int i = indexFor(e.hash, newCapacity);
                    e.next = newTable[i];
                    newTable[i] = e;
                    e = next;
                } while (e != null);
            }
        }
    }

重要在对transfer()方法的分析：
1. 该方法本质上就是把就的table里面的东西放到newtable里面。
2. 因为hash buckets扩大了一倍，所以对象的rehash code在新hash buckets里的位置index肯定不同了，需要重新计算。

Srcs:
How HashMap works in Java
http://javarevisited.blogspot.com/2011/02/how-hashmap-works-in-java.html
Java theory and practice: Hashing it out
http://www.ibm.com/developerworks/java/library/j-jtp05273/index.html
Understanding strange Java hash function
http://*.com/questions/9335169/understanding-strange-java-hash-function
说一说java的hashcode6--HashMap的resize和transfer
http://www.hetaoblog.com/myblogs/post/%E8%AF%B4%E4%B8%80%E8%AF%B4java%E7%9A%84hashcode6-hashmap%E7%9A%84resize%E5%92%8Ctransfer.jhtml
About the implementation of Java HashMap
http://*.com/questions/10001672/about-the-implementation-of-java-hashmap?rq=1
What hashing function does Java use to implement Hashtable class？
http://*.com/questions/9364134/what-hashing-function-does-java-use-to-implement-hashtable-class

典型分析 - 红黑树之 TreeMap：

引用

Srcs：
Red-Black Tree Tutorial：
http://forrestyu.net/art/red-black-tree-tutorial/

关于IdentityHashMap：
This class is not a general-purpose Map implementation! While this class implements the Map interface, it intentionally violates Map's general contract, which mandates the use of the equals method when comparing objects. This class is designed for use only in the rare cases wherein reference-equality semantics are required.
IdentityHashMap是Map中的另类，它故意违反了Map接口的通用约定，没有使用equals作为key对象重复的判断依据，而是使用 == 作为key对象重复的判断依据。在罕见的、需要 == 相等来作为元素重复的判断标准的情况下，你可以使用它。

Java集合的Stack、Queue、Map的遍历
http://lavasoft.blog.51cto.com/62575/181781

两个实用工具类 Arrays & Collections

引用

Arrays
This class contains various methods for manipulating arrays (such as sorting and searching). This class also contains a static factory that allows arrays to be viewed as lists.
Collections
This class consists exclusively of static methods that operate on or return collections. It contains polymorphic algorithms that operate on collections, "wrappers", which return a new collection backed by a specified collection, and a few other odds and ends（Odds and ends：零碎东西）.

未完 Java Collections | 容器博客分类： Java Foundation java容器collection