关于hashmap的散列程度分析
背景
先从查找算法说起,我们知道常用的查找算法有顺序查找(时间复杂度为O(n)),二分查找(时间复杂度为O(log2n)),二叉查找树查找(O(longN)),有基于Hash表的算法时间复杂度只有O(1)
但是基于hash表的算法要额外维护一个hash表,这也是用空间换时间的例子吧.
什么是HashMap
hashMap可以拆分为hash和map,hash是一个函数,或者说算法,用来计算hashCode;
可以定义为:hashCode = Hash(key)
map用来存放hashCode和key,以及value,根据key取出某个值,只需要调用函数,计算出hashCode即可拿到value,所以时间复杂度为O(1);
构造散列表常用的方法
1.数字分析法
2.平方取中法
3.折叠法
4.除留余数法
处理冲突的方法:
1.开放地址法
1.线性探测法
2.二次探测法
3.伪随机探测
2.链地址法(Java集合框架中hashmap处理冲突采用的方法)
Java集合框架中Hashmap的分析
在此不拿源码作为分析,从源码中截取比较有意思的几个片段作为分析
hashMap中为什么要求table的长度为2的n次幂?
在源码中有这样说:
/**
* Returns a power of two size for the given target capacity.
*/
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
这是保证初始化的hashMap的长度为2的次幂;
而在hashmap中通过定位某一个元素,是根据它的key和key的hashcode找出value的
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {//注意这句
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
hashMap中找table中的下标值就是用 (n - 1) & hash来确定,如果n不是2的幂-1,那么在插入的元素中,不能将其均匀的散列
测试代码:
package com.tangbaobao.baidu;
import org.junit.jupiter.api.Test;
import java.util.HashMap;
import java.util.Map;
/**
* 哈希冲突计算
*
* @author 唐学俊
* @create 2018/04/04
**/
public class TestHashCode {
private final int DEFAULT_CAPACITY = 64;
/**
* 用于统计重复
*/
Map<Integer, Integer> map = new HashMap<>();
@Test
public void fun1() {
String[] strings = {"hello", "world", "word", "nihao", "test", "main", "static","fdf","fdsfwrewr","reqq","hhh",
"fsoopt","4424hfhg","53grtyr","5re53trt"
};
for (int i = 0; i < strings.length; i++) {
this.getIndex(strings[i]);
}
map.forEach((x, y) -> System.out.println("hash_index:" + Integer.toBinaryString(x) + " " + "count:" + y));
System.out.println(map.size());
}
/**
* 输出
* @param string
*/
public void getIndex(String string) {
//1.获得hashCode
int hashCode = string.hashCode();
System.out.println("hashCode:" +Integer.toBinaryString(hashCode));
//2.获得length
int length_code = DEFAULT_CAPACITY - 1;
System.out.println("length :" + Integer.toBinaryString(length_code));
//扰动函数
int hash = hashCode^(hashCode>>>16);
System.out.println("扰动函数"+Integer.toBinaryString(hash));
int index =hash& length_code;
//3.获得index
System.out.println("index :" + Integer.toBinaryString(index));
int count = 1;
if (map.containsKey(index)) {
Integer integer = map.get(index);
count = ++integer;
}
map.put(index, count);
System.out.println("--------------------------------");
}
}
测试用例:
将随机字符串插入散列表,统计其散列程度:
hashMap的大小:64,32,16
加载因子用的是0.75
hashMap大小:16,调用用扰动函数
hashCode:101111010010001100011010010
length :1111
扰动函数101111010010001110100111011
index :1011
hashCode:110110000010001101110010010
length :1111
扰动函数110110000010001110101010011
index :11
hashCode:1101111100011100001010
length :1111
扰动函数1101111100011100111101
index :1101
hashCode:110001111110110011011111011
length :1111
扰动函数110001111110110000011000100
index :100
hashCode:1101100100010010010010
length :1111
扰动函数1101100100010010100100
index :100
hashCode:1100110000010110111001
length :1111
扰动函数1100110000010110001010
index :1010
hashCode:11001010110011011100111001101110
length :1111
扰动函数11001010110011010000010010100011
index :11
hashCode:11000101101101000
length :1111
扰动函数11000101101101001
index :1001
hashCode:11111011101101111100001010100
length :1111
扰动函数11111011101101110011100100010
index :10
hashCode:1101010101101110010011
length :1111
扰动函数1101010101101110100110
index :110
hashCode:11001001101101000
length :1111
扰动函数11001001101101001
index :1001
hashCode:10110100100101110000001000110001
length :1111
扰动函数10110100100101111011011010100110
index :110
hashCode:110000111000010000111000111111
length :1111
扰动函数110000111000010011111011011110
index :1110
hashCode:1010000100100100111010111100100
length :1111
扰动函数1010000100100100010010101110110
index :110
hashCode:10111111100000011111001000110000
length :1111
扰动函数10111111100000010100110110110001
index :1
hash_index:1 count:1
hash_index:10 count:1
hash_index:11 count:2
hash_index:100 count:2
hash_index:110 count:3
hash_index:1001 count:2
hash_index:1010 count:1
hash_index:1011 count:1
hash_index:1101 count:1
hash_index:1110 count:1
10
hashMap大小:16,不调用用扰动函数
hashCode:101111010010001100011010010
length :1111
扰动函数101111010010001110100111011
index :10
hashCode:110110000010001101110010010
length :1111
扰动函数110110000010001110101010011
index :10
hashCode:1101111100011100001010
length :1111
扰动函数1101111100011100111101
index :1010
hashCode:110001111110110011011111011
length :1111
扰动函数110001111110110000011000100
index :1011
hashCode:1101100100010010010010
length :1111
扰动函数1101100100010010100100
index :10
hashCode:1100110000010110111001
length :1111
扰动函数1100110000010110001010
index :1001
hashCode:11001010110011011100111001101110
length :1111
扰动函数11001010110011010000010010100011
index :1110
hashCode:11000101101101000
length :1111
扰动函数11000101101101001
index :1000
hashCode:11111011101101111100001010100
length :1111
扰动函数11111011101101110011100100010
index :100
hashCode:1101010101101110010011
length :1111
扰动函数1101010101101110100110
index :11
hashCode:11001001101101000
length :1111
扰动函数11001001101101001
index :1000
hashCode:10110100100101110000001000110001
length :1111
扰动函数10110100100101111011011010100110
index :1
hashCode:110000111000010000111000111111
length :1111
扰动函数110000111000010011111011011110
index :1111
hashCode:1010000100100100111010111100100
length :1111
扰动函数1010000100100100010010101110110
index :100
hashCode:10111111100000011111001000110000
length :1111
扰动函数10111111100000010100110110110001
index :0
hash_index:0 count:1
hash_index:1 count:1
hash_index:10 count:3
hash_index:11 count:1
hash_index:100 count:2
hash_index:1000 count:2
hash_index:1001 count:1
hash_index:1010 count:1
hash_index:1011 count:1
hash_index:1110 count:1
hash_index:1111 count:1
11
hashMap大小:64,调用用扰动函数
hashCode:101111010010001100011010010
length :111111
扰动函数101111010010001110100111011
index :111011
hashCode:110110000010001101110010010
length :111111
扰动函数110110000010001110101010011
index :10011
hashCode:1101111100011100001010
length :111111
扰动函数1101111100011100111101
index :111101
hashCode:110001111110110011011111011
length :111111
扰动函数110001111110110000011000100
index :100
hashCode:1101100100010010010010
length :111111
扰动函数1101100100010010100100
index :100100
hashCode:1100110000010110111001
length :111111
扰动函数1100110000010110001010
index :1010
hashCode:11001010110011011100111001101110
length :111111
扰动函数11001010110011010000010010100011
index :100011
hashCode:11000101101101000
length :111111
扰动函数11000101101101001
index :101001
hashCode:11111011101101111100001010100
length :111111
扰动函数11111011101101110011100100010
index :100010
hashCode:1101010101101110010011
length :111111
扰动函数1101010101101110100110
index :100110
hashCode:11001001101101000
length :111111
扰动函数11001001101101001
index :101001
hashCode:10110100100101110000001000110001
length :111111
扰动函数10110100100101111011011010100110
index :100110
hashCode:110000111000010000111000111111
length :111111
扰动函数110000111000010011111011011110
index :11110
hashCode:1010000100100100111010111100100
length :111111
扰动函数1010000100100100010010101110110
index :110110
hashCode:10111111100000011111001000110000
length :111111
扰动函数10111111100000010100110110110001
index :110001
hash_index:100010 count:1
hash_index:100011 count:1
hash_index:100 count:1
hash_index:100100 count:1
hash_index:100110 count:2
hash_index:101001 count:2
hash_index:1010 count:1
hash_index:110001 count:1
hash_index:10011 count:1
hash_index:110110 count:1
hash_index:111011 count:1
hash_index:111101 count:1
hash_index:11110 count:1
13
hashMap大小:64,不调用用扰动函数
hashCode:101111010010001100011010010
length :111111
扰动函数101111010010001110100111011
index :10010
hashCode:110110000010001101110010010
length :111111
扰动函数110110000010001110101010011
index :10010
hashCode:1101111100011100001010
length :111111
扰动函数1101111100011100111101
index :1010
hashCode:110001111110110011011111011
length :111111
扰动函数110001111110110000011000100
index :111011
hashCode:1101100100010010010010
length :111111
扰动函数1101100100010010100100
index :10010
hashCode:1100110000010110111001
length :111111
扰动函数1100110000010110001010
index :111001
hashCode:11001010110011011100111001101110
length :111111
扰动函数11001010110011010000010010100011
index :101110
hashCode:11000101101101000
length :111111
扰动函数11000101101101001
index :101000
hashCode:11111011101101111100001010100
length :111111
扰动函数11111011101101110011100100010
index :10100
hashCode:1101010101101110010011
length :111111
扰动函数1101010101101110100110
index :10011
hashCode:11001001101101000
length :111111
扰动函数11001001101101001
index :101000
hashCode:10110100100101110000001000110001
length :111111
扰动函数10110100100101111011011010100110
index :110001
hashCode:110000111000010000111000111111
length :111111
扰动函数110000111000010011111011011110
index :111111
hashCode:1010000100100100111010111100100
length :111111
扰动函数1010000100100100010010101110110
index :100100
hashCode:10111111100000011111001000110000
length :111111
扰动函数10111111100000010100110110110001
index :110000
hash_index:110000 count:1
hash_index:110001 count:1
hash_index:10010 count:3
hash_index:10011 count:1
hash_index:10100 count:1
hash_index:100100 count:1
hash_index:101000 count:2
hash_index:111001 count:1
hash_index:1010 count:1
hash_index:111011 count:1
hash_index:101110 count:1
hash_index:111111 count:1
12
结论
加上扰动函数之后,散列更加均匀
上一篇: PTA 带头节点的双向循环链表操作
下一篇: C++循环双链表带头节点