hashmap分析

程序员文章站 2022-03-03 23:07:49
...
 
/**
hashmap底层维护着一个entry数组,每个数组索引指向的是一个entry链表.entry是一个key和value对,
entry内部还保存着一个next属性,用于指向下一个entry,如果entry后面没有其他的entry,则next=null
hashmap根据要存储entry中key的hashCode值,调用内部的hash()算法,返回一个hash值,然后跟
entry数组容量(2的倍数)capacity-1做按位与运算,来获得数组上的索引.这样的操作也可以称为碰撞检测.
执行add entry时,如果发现某个索引位置上已经存在有entry了,
则让新的entry的next指向当前数组索引指向的entry, 同时让当前数组索引指向新的entry.

关于hash值和capacity-1做按位与运算
因为entry数组的capacity大小都是2的倍数,所以用capacity-1和任何数做按位与操作后的大小都不会大于
capacity(>=0, <=capacity-1)
比如capacity=16,对应的二进制为10000
capacity-1:    1111
hash value:  110000 
------------------------- 
             000000 (最小0)

capacity-1:    1111
hash value:  111111 
------------------------- 
             001111 (最大15)
任何数跟16-1做按位与后的结果,都不可能大于16,最大只能为16-1,最小为0
刚好在entry数组的索引范围之内.

hashmap源码分析
------------------------------------------------------------
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    //初始化capacity不能超过MAXIMUM_CAPACITY
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    //负载因子不能小于0或者为非数字
    //负载因子用于计算threshold,即当前entry数组最大能存储的entry数
    //如果超过threshold,就需要执行rehash操作(将原来的entry数组扩容2倍,得到一个新的entry数组,
    //遍历原entry数组,取出每个entry的hash值,
    //跟新数组的capacity-1做按位与操作得到新数组索引位置,存入entry)
    //负载因子越大,占用的存储空间越小,但是查询时间会相应的增加(碰撞检测的次数增加了),
    //因此效率会相应下降,负载因为越小,占用的存储空间会变大,查询速度也会相应的变快.
    //hashmap里面默认的负载因子大小为0.75
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);

    // Find a power of 2 >= initialCapacity
    int capacity = 1;
    //得到一个大于等于初始capacity的最小2的倍数的数
    while (capacity < initialCapacity)
        capacity <<= 1; //每次左移一位,即capacity*2

    this.loadFactor = loadFactor;
    //threshold为数组capacity*负载因子
    threshold = (int)(capacity * loadFactor);
    //生成一个大小为capacity的entry数组
    table = new Entry[capacity];
    init();
}

public HashMap(int initialCapacity) {
    //传入自定义的initial capacity和默认的load factor
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
    //默认的数组capacity为16
    table = new Entry[DEFAULT_INITIAL_CAPACITY];
    init();
}
--------------------------------------------
public V put(K key, V value) {
    if (key == null)
    	//key为null通过putForNullKey存入entry
        return putForNullKey(value);
    //调用hash算法用传入key的hashCode生成一个hash值
    int hash = hash(key.hashCode());
    //用hash值跟capacity-1做按位与,得到数组的索引位置
    int i = indexFor(hash, table.length);
    //遍历该索引上的entry链表
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        //如果发现相同的key,用新的value替换掉原来的,将原来的value返回
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    //没找到key一样的entry,就新创建一个entry
    addEntry(hash, key, value, i);
    return null;
}

//hashmap里hash算法实现
static int hash(int h) {
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

//hash值和entry数组capacity-1做按位与运算,返回运算结果
static int indexFor(int h, int length) {
    return h & (length-1);
}

void addEntry(int hash, K key, V value, int bucketIndex) {
    //获得当前数组索引上的entry
    Entry<K,V> e = table[bucketIndex];
    //创建一个新的entry,next指向数组索引指向的entry,然后数组索引指向新的entry
    table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
    //如果当前已存入的entry数超过最大值threshold,执行rehash操作
    if (size++ >= threshold)
        resize(2 * table.length);
}
---------------------------------------------------
//操作跟put类似
public V get(Object key) {
    if (key == null)
        return getForNullKey();
    int hash = hash(key.hashCode());
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
            return e.value;
    }
    return null;
}

----------------------------------------------------------------------
多线程环境下调用hashmap的get方法导致死循环
假设当前hashmap的capacity=2,hash()算法为key mod capacity
依次向hashmap存入{key=5,value=B},{key=3,value=A},{key=7,value=C}
因为key=3,5,7和capacity=2进行模运算结果都为1, 这三个entry都会出现在数组索引为1的地方,
且key=5的entry出现在链表的最后一个位置
hashmap底层数据结构如下图所示
     _ _
    |_0_|      _ _        _ _        _ _
    |_1_| --> | 7 |  --> | 3 |  --> | 5 | --> null
      ↓       |_C_|      |_A_|      |_B_| 
     entry数组                         ↓     
     (索引指向一个entry链表)           表示一个entry
                                    (每个entry都有一个next属性,
                                    指向下一个entry,
                                    最后一个entry的next为null)

假设向hashmap存入新的ertry时,已存入的entry数量(size)>=entry数组最大能存储的数量(threshold),
需要进行rehash操作,将entry数组扩容为原来大小的两倍,得到一个新的entry数组,遍历原数组,
取出每个索引值上的entry,如果entry不为空,拿到当前entry的hash值,跟新entry数组capacity-1做hash操作,
得到当前entry在新数组中的索引值,将当前entry的next设置为新entry数组所在索引指向的对象,
让新entry数组的所在索引指向当前entry对象(即[7,C]的next指向null, 
新entry数组索引为3的位置指向[7,C]),让当前entry的next作为当前entry,继续上面的操作,
直到遍历完原entry数组,最后将新的entry数组赋给原entry数组引用.

下图为rehash后的结果
     _ _       _ _
    |_0_|     | 5 | --> null
    |_1_| --> |_B_|
    |_2_|      _ _       _ _
    |_3_| --> | 3 | --> | 7 | --> null
              |_A_|     |_C_|

rehash操作源码
void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    //生成一个新的entry数组,
    //resize(2 * table.length)
    //newCapacity为原entry数组长度的2倍 
    Entry[] newTable = new Entry[newCapacity];
    transfer(newTable);
    table = newTable;
    threshold = (int)(newCapacity * loadFactor);
}        		   				 

void transfer(Entry[] newTable) {
    Entry[] src = table;
    int newCapacity = newTable.length;
    for (int j = 0; j < src.length; j++) {
        //获得entry数组中索引j指向的entry
        Entry<K,V> e = src[j];
        if (e != null) {
            src[j] = null;
            do {
                //保留当前entry的next 
                Entry<K,V> next = e.next; ①
                //取出当前entry的hash和newCapcacity-1做hash操作,
                //得到当前entry在新entry数组中的索引
                int i = indexFor(e.hash, newCapacity);
                //将当前entry的next设置为新entry数组所在索引指向的对象
                e.next = newTable[i];
                //新entry数组所在索引指向当前entry对象
                newTable[i] = e;
                //当前entry的next作为当前对象
                e = next;
            } while (e != null);
        }
    }
}

在单线程环境下,rehash不会有问题,而在多线程环境下,rehash就会有问题了
假设有两个线程A和B同时进入transfer方法执行rehash操作,线程A执行到①就被挂起了,
线程B一直到执行结束,线程A才重新获得调度,继续往下执行

当线程A被挂起的时候,e=[7,C], next=[3,A]
     _ _
    |_0_|      _ _        _ _        _ _
    |_1_| --> | 7 |  --> | 3 |  --> | 5 | --> null
              |_C_|      |_A_|      |_B_|

线程B执行完rehash后,entry table的结构为
     _ _       _ _
    |_0_|     | 5 | --> null
    |_1_| --> |_B_|
    |_2_|      _ _       _ _
    |_3_| --> | 3 | --> | 7 | --> null
              |_A_|     |_C_|


此时线程A是基于新的entry链表来rehash
    1) e=[7,C], next=[3,A] 挂起之前的值
     _ _ 
    |_0_|
    |_1_|
    |_2_|      _ _
    |_3_| --> | 7 | --> null
              |_C_| 

    2) e=[3,A], next=[7,C] 线程B执行rehash后[3,A]的next为[7,C]
     _ _ 
    |_0_|
    |_1_|
    |_2_|      _ _       _ _
    |_3_| --> | 3 | --> | 7 | --> null
              |_A_|     |_C_|             

    3) e=[7,C], next=null 线程B执行rehash后[7,C]的next为null
     _ _ 
    |_0_|
    |_1_|
    |_2_|      _ _       _ _
    |_3_| --> | 7 | --> | 3 | --> null
              |_C_| <-- |_A_|     

    此时[3,A]和[7,C]形成了循环引用
    此时再调用get(5)就会导致死循环出现                                  
*/
hashmap分析

C#实现的封装CURD到SqlHelper类用法简单分析

Python图算法实例分析

Java算法之最长公共子序列问题(LCS)实例分析

MySQL InnoDB和MyISAM数据引擎的差别分析

Android开发中播放声音的两种方法分析

实例分析IOS实现自动打包

Android4.4开发之电池低电量告警提示原理与实现方法分析

JS小数转换为整数的方法分析

jQuery+ajax的资源回收处理机制分析

Android实现类似IOS右滑返回的效果(原因分析及解决办法)