欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

关于hashmap的散列程度分析

程序员文章站 2022-03-16 08:50:01
...

背景

先从查找算法说起,我们知道常用的查找算法有顺序查找(时间复杂度为O(n)),二分查找(时间复杂度为O(log2n)),二叉查找树查找(O(longN)),有基于Hash表的算法时间复杂度只有O(1)
但是基于hash表的算法要额外维护一个hash表,这也是用空间换时间的例子吧.

什么是HashMap

hashMap可以拆分为hash和map,hash是一个函数,或者说算法,用来计算hashCode;
可以定义为:hashCode = Hash(key)
map用来存放hashCode和key,以及value,根据key取出某个值,只需要调用函数,计算出hashCode即可拿到value,所以时间复杂度为O(1);

构造散列表常用的方法

1.数字分析法
2.平方取中法
3.折叠法
4.除留余数法

处理冲突的方法:

1.开放地址法
1.线性探测法
2.二次探测法
3.伪随机探测
2.链地址法(Java集合框架中hashmap处理冲突采用的方法)

Java集合框架中Hashmap的分析

在此不拿源码作为分析,从源码中截取比较有意思的几个片段作为分析
hashMap中为什么要求table的长度为2的n次幂?
在源码中有这样说:

    /**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

这是保证初始化的hashMap的长度为2的次幂;
而在hashmap中通过定位某一个元素,是根据它的key和key的hashcode找出value的

final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {//注意这句
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

hashMap中找table中的下标值就是用 (n - 1) & hash来确定,如果n不是2的幂-1,那么在插入的元素中,不能将其均匀的散列

测试代码:

package com.tangbaobao.baidu;

import org.junit.jupiter.api.Test;

import java.util.HashMap;
import java.util.Map;

/**
 * 哈希冲突计算
 *
 * @author 唐学俊
 * @create 2018/04/04
 **/

public class TestHashCode {
    private final int DEFAULT_CAPACITY = 64;

    /**
     * 用于统计重复
     */

    Map<Integer, Integer> map = new HashMap<>();

    @Test
    public void fun1() {
           String[] strings = {"hello", "world", "word", "nihao", "test", "main", "static","fdf","fdsfwrewr","reqq","hhh",
           "fsoopt","4424hfhg","53grtyr","5re53trt"
           };

        for (int i = 0; i < strings.length; i++) {
            this.getIndex(strings[i]);
        }
        map.forEach((x, y) -> System.out.println("hash_index:" + Integer.toBinaryString(x) + "   " + "count:" + y));
        System.out.println(map.size());
    }

    /**
     * 输出
     * @param string
     */
    public void getIndex(String string) {
        //1.获得hashCode
        int hashCode = string.hashCode();
        System.out.println("hashCode:" +Integer.toBinaryString(hashCode));

        //2.获得length
        int length_code = DEFAULT_CAPACITY - 1;
        System.out.println("length  :" + Integer.toBinaryString(length_code));
        //扰动函数
        int hash = hashCode^(hashCode>>>16);
        System.out.println("扰动函数"+Integer.toBinaryString(hash));
        int index =hash& length_code;
        //3.获得index
        System.out.println("index   :" + Integer.toBinaryString(index));
        int count = 1;
        if (map.containsKey(index)) {
            Integer integer = map.get(index);
            count = ++integer;
        }
        map.put(index, count);
        System.out.println("--------------------------------");
    }
}

测试用例:
将随机字符串插入散列表,统计其散列程度:
hashMap的大小:64,32,16
加载因子用的是0.75

hashMap大小:16,调用用扰动函数

hashCode:101111010010001100011010010
length :1111
扰动函数101111010010001110100111011

index :1011

hashCode:110110000010001101110010010
length :1111
扰动函数110110000010001110101010011

index :11

hashCode:1101111100011100001010
length :1111
扰动函数1101111100011100111101

index :1101

hashCode:110001111110110011011111011
length :1111
扰动函数110001111110110000011000100

index :100

hashCode:1101100100010010010010
length :1111
扰动函数1101100100010010100100

index :100

hashCode:1100110000010110111001
length :1111
扰动函数1100110000010110001010

index :1010

hashCode:11001010110011011100111001101110
length :1111
扰动函数11001010110011010000010010100011

index :11

hashCode:11000101101101000
length :1111
扰动函数11000101101101001

index :1001

hashCode:11111011101101111100001010100
length :1111
扰动函数11111011101101110011100100010

index :10

hashCode:1101010101101110010011
length :1111
扰动函数1101010101101110100110

index :110

hashCode:11001001101101000
length :1111
扰动函数11001001101101001

index :1001

hashCode:10110100100101110000001000110001
length :1111
扰动函数10110100100101111011011010100110

index :110

hashCode:110000111000010000111000111111
length :1111
扰动函数110000111000010011111011011110

index :1110

hashCode:1010000100100100111010111100100
length :1111
扰动函数1010000100100100010010101110110

index :110

hashCode:10111111100000011111001000110000
length :1111
扰动函数10111111100000010100110110110001

index :1

hash_index:1 count:1
hash_index:10 count:1
hash_index:11 count:2
hash_index:100 count:2
hash_index:110 count:3
hash_index:1001 count:2
hash_index:1010 count:1
hash_index:1011 count:1
hash_index:1101 count:1
hash_index:1110 count:1
10

hashMap大小:16,不调用用扰动函数

hashCode:101111010010001100011010010
length :1111
扰动函数101111010010001110100111011

index :10

hashCode:110110000010001101110010010
length :1111
扰动函数110110000010001110101010011

index :10

hashCode:1101111100011100001010
length :1111
扰动函数1101111100011100111101

index :1010

hashCode:110001111110110011011111011
length :1111
扰动函数110001111110110000011000100

index :1011

hashCode:1101100100010010010010
length :1111
扰动函数1101100100010010100100

index :10

hashCode:1100110000010110111001
length :1111
扰动函数1100110000010110001010

index :1001

hashCode:11001010110011011100111001101110
length :1111
扰动函数11001010110011010000010010100011

index :1110

hashCode:11000101101101000
length :1111
扰动函数11000101101101001

index :1000

hashCode:11111011101101111100001010100
length :1111
扰动函数11111011101101110011100100010

index :100

hashCode:1101010101101110010011
length :1111
扰动函数1101010101101110100110

index :11

hashCode:11001001101101000
length :1111
扰动函数11001001101101001

index :1000

hashCode:10110100100101110000001000110001
length :1111
扰动函数10110100100101111011011010100110

index :1

hashCode:110000111000010000111000111111
length :1111
扰动函数110000111000010011111011011110

index :1111

hashCode:1010000100100100111010111100100
length :1111
扰动函数1010000100100100010010101110110

index :100

hashCode:10111111100000011111001000110000
length :1111
扰动函数10111111100000010100110110110001

index :0

hash_index:0 count:1
hash_index:1 count:1
hash_index:10 count:3
hash_index:11 count:1
hash_index:100 count:2
hash_index:1000 count:2
hash_index:1001 count:1
hash_index:1010 count:1
hash_index:1011 count:1
hash_index:1110 count:1
hash_index:1111 count:1
11

hashMap大小:64,调用用扰动函数

hashCode:101111010010001100011010010
length :111111
扰动函数101111010010001110100111011

index :111011

hashCode:110110000010001101110010010
length :111111
扰动函数110110000010001110101010011

index :10011

hashCode:1101111100011100001010
length :111111
扰动函数1101111100011100111101

index :111101

hashCode:110001111110110011011111011
length :111111
扰动函数110001111110110000011000100

index :100

hashCode:1101100100010010010010
length :111111
扰动函数1101100100010010100100

index :100100

hashCode:1100110000010110111001
length :111111
扰动函数1100110000010110001010

index :1010

hashCode:11001010110011011100111001101110
length :111111
扰动函数11001010110011010000010010100011

index :100011

hashCode:11000101101101000
length :111111
扰动函数11000101101101001

index :101001

hashCode:11111011101101111100001010100
length :111111
扰动函数11111011101101110011100100010

index :100010

hashCode:1101010101101110010011
length :111111
扰动函数1101010101101110100110

index :100110

hashCode:11001001101101000
length :111111
扰动函数11001001101101001

index :101001

hashCode:10110100100101110000001000110001
length :111111
扰动函数10110100100101111011011010100110

index :100110

hashCode:110000111000010000111000111111
length :111111
扰动函数110000111000010011111011011110

index :11110

hashCode:1010000100100100111010111100100
length :111111
扰动函数1010000100100100010010101110110

index :110110

hashCode:10111111100000011111001000110000
length :111111
扰动函数10111111100000010100110110110001

index :110001

hash_index:100010 count:1
hash_index:100011 count:1
hash_index:100 count:1
hash_index:100100 count:1
hash_index:100110 count:2
hash_index:101001 count:2
hash_index:1010 count:1
hash_index:110001 count:1
hash_index:10011 count:1
hash_index:110110 count:1
hash_index:111011 count:1
hash_index:111101 count:1
hash_index:11110 count:1
13

hashMap大小:64,不调用用扰动函数

hashCode:101111010010001100011010010
length :111111
扰动函数101111010010001110100111011

index :10010

hashCode:110110000010001101110010010
length :111111
扰动函数110110000010001110101010011

index :10010

hashCode:1101111100011100001010
length :111111
扰动函数1101111100011100111101

index :1010

hashCode:110001111110110011011111011
length :111111
扰动函数110001111110110000011000100

index :111011

hashCode:1101100100010010010010
length :111111
扰动函数1101100100010010100100

index :10010

hashCode:1100110000010110111001
length :111111
扰动函数1100110000010110001010

index :111001

hashCode:11001010110011011100111001101110
length :111111
扰动函数11001010110011010000010010100011

index :101110

hashCode:11000101101101000
length :111111
扰动函数11000101101101001

index :101000

hashCode:11111011101101111100001010100
length :111111
扰动函数11111011101101110011100100010

index :10100

hashCode:1101010101101110010011
length :111111
扰动函数1101010101101110100110

index :10011

hashCode:11001001101101000
length :111111
扰动函数11001001101101001

index :101000

hashCode:10110100100101110000001000110001
length :111111
扰动函数10110100100101111011011010100110

index :110001

hashCode:110000111000010000111000111111
length :111111
扰动函数110000111000010011111011011110

index :111111

hashCode:1010000100100100111010111100100
length :111111
扰动函数1010000100100100010010101110110

index :100100

hashCode:10111111100000011111001000110000
length :111111
扰动函数10111111100000010100110110110001

index :110000

hash_index:110000 count:1
hash_index:110001 count:1
hash_index:10010 count:3
hash_index:10011 count:1
hash_index:10100 count:1
hash_index:100100 count:1
hash_index:101000 count:2
hash_index:111001 count:1
hash_index:1010 count:1
hash_index:111011 count:1
hash_index:101110 count:1
hash_index:111111 count:1
12

结论

加上扰动函数之后,散列更加均匀

相关标签: hashMap