HashMap源码与相关面试题

程序员文章站 2022-06-22 08:19:33

一、哈希表哈希表是一种可以快速定位得数据结构。哈希表可以做到平均查找、插入、删除时间是，当然这是指不发生碰撞得情况。而哈希表最大得缺陷就是哈希值得碰撞( )。碰撞：就是指桶有多个元素了。常见解决哈希碰撞得方法就是在桶后面加个链表这里就引入第一个问题：为什么的底层设计要采用哈希表的这种 ......

一、哈希表

哈希表是一种可以快速定位得数据结构。哈希表可以做到平均查找、插入、删除时间是o(1)，当然这是指不发生hash碰撞得情况。而哈希表最大得缺陷就是哈希值得碰撞(collision)。

hash碰撞：就是指hash桶有多个元素了。常见解决哈希碰撞得方法就是在hash桶后面加个链表

这里就引入第一个问题：为什么map的底层设计要采用哈希表的这种数据结构？

hashmap设计时，要求其key不能重复。所以每次往hashmap设置值时，需要对hashmap现在容器所有key进行筛选，以保证不会对hashmap设置同样得key。

如果以普通if、else来判断得话，这样查找效率会非常得低。如果里面元素个数达到几百得话，程序得运行效率就会非常低。而这种情况，很符合哈希表得解决场景。

哈希表得一种实现方式就是数组+链表。这也是java7hashmap底层哈希表的实现方式。

二、`jdk1.7`的`hashmap`

1、相关概念

（1）初始容量

/**
 * the default initial capacity - must be a power of two.
 */
static final int default_initial_capacity = 1 << 4; // aka 16

（2）负载因素

/**
 * the load factor used when none specified in constructor.
 */
static final float default_load_factor = 0.75f;

（3）阙值

/**
 * the next size value at which to resize (capacity * load factor).
 * @serial
 */
// if table == empty_table then this is the initial capacity at which the
// table will be created when inflated.
int threshold;

（4）hash种子

/**
 * a randomizing value associated with this instance that is applied to
 * hash code of keys to make hash collisions harder to find. if 0 then
 * alternative hashing is disabled.
 */
// 标志位，是jdk为了解决hashmap安全隐患做的补丁
transient int hashseed = 0;

2、源码分析

（1）put

public v put(k key, v value) {
	/* hashmap默认是延迟初始化，只有在用的时候，才会开辟空间 */
	// 如果hash表为空，则开始扩容hash表
	if (table == empty_table) {
		inflatetable(threshold);
	}
	// 对key为null的相关操作
	if (key == null)
		return putfornullkey(value);
	// 计算key的hash值
	int hash = hash(key);
	// 计算key的hash值对应哈希桶的下标
	int i = indexfor(hash, table.length);
	// 对哈希桶内的链表进行更新操作：可以看到如果是第一次插入，返回的就是null
	for (entry<k,v> e = table[i]; e != null; e = e.next) {
		object k;
		if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
			v oldvalue = e.value;
			e.value = value;
			e.recordaccess(this);
			return oldvalue;
		}
	}
}

（2）inflatetable：扩充哈希表

private void inflatetable(int tosize) {
	// find a power of 2 >= tosize
    /* 这里会发现即使你输入hashmap容量不是2的n次方，这里也会强制变成2的n次方 */
    // 计算哈希表桶的个数
	int capacity = rounduptopowerof2(tosize);

	threshold = (int) math.min(capacity * loadfactor, maximum_capacity + 1);
	table = new entry[capacity];
    // 按需初始化hash种子，也是解决hashmap安全隐患做的补丁
	inithashseedasneeded(capacity);
}

（3）indexfor：计算key在哈希桶的下标

static int indexfor(int h, int length) {
	// assert integer.bitcount(length) == 1 : "length must be a non-zero power of 2";
	return h & (length-1);
}

这个方法就是为什么hashmap的初始值要设置2的n次方的原因。它可以快速得到key要放在hash桶的下标。

（4）扩容方法

当hashmap的size大于threshold时，就会扩容。而threshold = 容量 * 阈值

低效
线程不安全

void resize(int newcapacity) {
	entry[] oldtable = table;
	int oldcapacity = oldtable.length;
	if (oldcapacity == maximum_capacity) {
		threshold = integer.max_value;
		return;
	}

	entry[] newtable = new entry[newcapacity];
    // 将旧哈希表的元素转移到新的哈希表；这里是死锁的原因
	transfer(newtable, inithashseedasneeded(newcapacity));
	table = newtable;
	threshold = (int)math.min(newcapacity * loadfactor, maximum_capacity + 1);
}

3、`jdk1.7`的`hashmap`中的缺点

（1）死锁问题

死锁问题是因为用户将hashmap应用到多线程的环境下。详情参考：

死锁问题是由于hashmap在扩容时，里面元素的顺序可能没有按照之前顺序排列，从而导致环形链表，造成死锁情况。

（2）潜在的安全隐患

tomcat底层接收参数是使用hashtable接收，假设黑客使用特意组合的参数，使用所有的参数都挂载在同一个哈希桶中。这就会使得哈希表退化成链表。这其实就是由于哈希碰撞的导致。

如果黑客使用精心设计的参数（这些参数可以根据jdk公开的哈希算法测试出来），通过http请求，使用tomcat内部产生大量链表，消耗服务器的cpu，就会产生dos（denial of service）。

因此：

tomcat也对这种情况进行讨论。tomcat邮件

jdk1.7也对这种情况做了处理。例如：在inithashseedasneeded方法中，不使用jdk中string公开的哈希算法，而是使用其他的哈希算法来避免上述出现的问题。

final boolean inithashseedasneeded(int capacity) {
	// 判断是否需要使用hash算法
	boolean currentalthashing = hashseed != 0;
	boolean usealthashing = sun.misc.vm.isbooted() &&
		(capacity >= holder.alternative_hashing_threshold);
	boolean switching = currentalthashing ^ usealthashing;
	if (switching) {
		// 初始化hash种子
		hashseed = usealthashing
			? sun.misc.hashing.randomhashseed(this)
			: 0;
	}
	return switching;
}

三、`jdk1.8`的`hashmap`源码

jdk1.8中对hashmap进行了很大优化：

使用数组+链表/红黑树的数据结构，解决了底层哈希表的安全隐患；
在扩容时，通过对插入顺序的改进优化了死锁问题；

1、新增概念

（1）树化的阙值

    /**
     * the bin count threshold for using a tree rather than list for a
     * bin.  bins are converted to trees when adding an element to a
     * bin with at least this many nodes. the value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int treeify_threshold = 8;

（2）退树化的阈值

    /**
     * the bin count threshold for untreeifying a (split) bin during a
     * resize operation. should be less than treeify_threshold, and at
     * most 6 to mesh with shrinkage detection under removal.
     */
    static final int untreeify_threshold = 6;

2、源码阅读

（1）putval()

final v putval(int hash, k key, v value, boolean onlyifabsent, boolean evict) {
        node<k,v>[] tab; node<k,v> p; int n, i;
        // 判断哈希表是否为空，为空就创建
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // 如果是第一个节点，就往这个哈希桶添加一个元素
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newnode(hash, key, value, null);
        else {
            // 如果不是第一个节点
            node<k,v> e; k k;
            // 
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            // 判断节点是不是树节点
            else if (p instanceof treenode)
                e = ((treenode<k,v>)p).puttreeval(this, tab, hash, key, value);
            else {
                // 该节点就是链表
                for (int bincount = 0; ; ++bincount) {
                    if ((e = p.next) == null) {
                        p.next = newnode(hash, key, value, null);
                        // 如果达到可以红黑树化的量，就将链表转换为树
                        if (bincount >= treeify_threshold - 1) // -1 for 1st
                            treeifybin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                v oldvalue = e.value;
                if (!onlyifabsent || oldvalue == null)
                    e.value = value;
                afternodeaccess(e);
                return oldvalue;
            }
        }
        ++modcount;
        if (++size > threshold)
            resize();
        afternodeinsertion(evict);
        return null;
    }

（2）resize：扩容方法

     final node<k,v>[] resize() {
        node<k,v>[] oldtab = table;
        int oldcap = (oldtab == null) ? 0 : oldtab.length;
        int oldthr = threshold;
        int newcap, newthr = 0;
        if (oldcap > 0) {
            if (oldcap >= maximum_capacity) {
                threshold = integer.max_value;
                return oldtab;
            }
            else if ((newcap = oldcap << 1) < maximum_capacity &&
                     oldcap >= default_initial_capacity)
                newthr = oldthr << 1; // double threshold
        }
        else if (oldthr > 0) // initial capacity was placed in threshold
            newcap = oldthr;
        else {               // zero initial threshold signifies using defaults
            newcap = default_initial_capacity;
            newthr = (int)(default_load_factor * default_initial_capacity);
        }
        if (newthr == 0) {
            float ft = (float)newcap * loadfactor;
            newthr = (newcap < maximum_capacity && ft < (float)maximum_capacity ?
                      (int)ft : integer.max_value);
        }
        threshold = newthr;
        @suppresswarnings({"rawtypes","unchecked"})
        node<k,v>[] newtab = (node<k,v>[])new node[newcap];
        table = newtab;
        if (oldtab != null) {
            for (int j = 0; j < oldcap; ++j) {
                node<k,v> e;
                if ((e = oldtab[j]) != null) {
                    oldtab[j] = null;
                    if (e.next == null)
                        newtab[e.hash & (newcap - 1)] = e;
                    else if (e instanceof treenode)
                        ((treenode<k,v>)e).split(this, newtab, j, oldcap);
                    else { 
                        // preserve order：保持顺序，这里仅仅是降低死锁的概率，并没有完全解决死锁问题
                        // 低位链表
                        node<k,v> lohead = null, lotail = null;
                        // 高位链表
                        node<k,v> hihead = null, hitail = null;
                        node<k,v> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldcap) == 0) {
                                if (lotail == null)
                                    lohead = e;
                                else
                                    lotail.next = e;
                                lotail = e;
                            }
                            else {
                                if (hitail == null)
                                    hihead = e;
                                else
                                    hitail.next = e;
                                hitail = e;
                            }
                        } while ((e = next) != null);
                        if (lotail != null) {
                            lotail.next = null;
                            newtab[j] = lohead;
                        }
                        if (hitail != null) {
                            hitail.next = null;
                            newtab[j + oldcap] = hihead;
                        }
                    }
                }
            }
        }
        return newtab;
    }

四、相关面试题分析

hashmap默认容量
hashmap如何扩容
hashmap为什么是线程不安全的？

为什么`hashmap`的容量必须是`2`的`n`次方？

1、即使你在构造函数时，不传2的n次方，在后来的初始化方法中，也会强制变成2的n次方

1.7:put => inflatetable
    
1.8:hashmap => tablesizefor

2、让元素能够快速定位哈希桶；让hash表元素分布均匀，减少哈希碰撞的几率

（1）快速定位哈希桶

在哈希表实现中，有一个很重要的问题：如何将hash值映射到几个哈希桶里呢？对应java来说，就是int范围的数如何映射到哈希桶里呢？

很常见的解决方案是：取模，即int % n。但是这种方案有俩个缺点：

负数取模仍然是负数，这里要确保hash值一定是正数；
相比较hashmap的实现，这种方案性能稍微差一点；

1.7:indexfor(int h, int length){
    return h & (length - 1);// 记住这行代码，java8是直接用这段代码的
}

（2）为什么能让hash表元素分布均匀，减少哈希碰撞的几率？

leng - 1 等价于2n - 1，其值格式是：11111...，这样它与hash计算时，得到哈希桶下标，完全取决于hash，只要确保hash算法比较均衡，那么哈希表元素就会比较均衡了。

3、方便扩容

在扩容之后，所有的元素需要重新计算其所在哈希桶的位置。只要保证哈希桶容量是2的幂，其原先的元素要么保持不变，要么被移动到旧index+旧桶大小的位置上。

`indexfor`方法除了`hashmap`的实现方式，还有其他方式吗？跟`hashmap`的实现方式相比，有哪些不足？

我们知道java中对哈希值的实现是根据每个对象的hashcode方法，而这个方法会可能返回42亿个可能性。而我们可以通过indexfor方法将这些42亿个可能性引入到16个哈希桶中。

除了hashmap内部的实现方式，还可以使用取模方法来达到效果。但要注意取模中负号的可能性！！！

    static int indexfor(int hash) {
        if (hash < 0) {
            hash = -hash;
        }
        return hash % 16;
    }

这种方式想比较hashmap原生的实现方式比较，就是低效。不仅仅是在这里计算时低效，而且还有可能会造成严重的哈希碰撞。

`jdk1.8`后，为什么要以8作为转化树的阙值？为什么要选择0.75作为负载因子？

算法，在空间和时间的折中问题

五、相关应用

我在项目遇到要依次匹配method然后引入对应的处理类。相关代码如下：

public class apprestfactory {
    private static final string method_wo_ = "";//方法名
    /**
     * 工作中心人员的查询
     */
    private static final string method_workcenter_contact = "getworkcenterinfobycontact";
    /**
     * 工作中心人员的查询
     */
    private static final string method_todoinfolist = "gettodoinfolist";

    /**
     * 获取逻辑处理实例
     *
     * @param method
     * @return
     */
    public static apprestservice createservice(string method) {
        if (method_position.equalsignorecase(method)){
            return springutils.getbean(locationserviceimpl.class);
        }
        if (method_getwoinfolist.equalsignorecase(method)){
            return springutils.getbean(workorderserviceimpl.class);
        }
        return null;
    }
}

这里就可能存在if、else查找效率问题。假设最后一个if才是要找method的话，那么程序的运行效率就会非常低。这还是未完成的情况，假设后期这个method被补充到上百个，那效率就会及其低下。这里其实就可以使用hashmap底层的哈希表的数据结构来优化。代码如下：

public class apprestfactory {
    /**
     * 存储app端方法method得映射关系
     * todo:api数量确定后，固定hashmap得长度，防止hashmao扩容时，发生错误
     */
    private static final map<string, class> mapper = new hashmap<>(100);

    // 初始化method与其工作类得映射关系
    static {
        // 添加退回物料的信息
        mapper.put("addmaterialsreturninfo", womaterialserviceimpl.class);
        //
        mapper.put("getbomaosproductslist", bomserviceimpl.class);
    }


    /**
     * 获取逻辑处理实例
     *
     * @param method
     * @return
     */
    public static apprestservice createservice(string method) {
        return (apprestservice) springutils.getbean(mapper.get(method));
    }
}

上一篇：发现地球上最美的动物

下一篇： python如何遍历指定路径下所有文件（按按照时间区间检索）