HashSet集合保证元素唯一性原理分析

程序员文章站 2024-03-22 17:42:22

...

在学习HashSet集合时注意到该集合是元素不可重复的，经过自己尝试也确定该特点存在，所以一步步翻源码找了一下该特点的实现源码，于此记录。

首先上示例代码：

存储字符串并遍历

import java.util.HashSet;

public class HashSetDemo {
    public static void main(String[] args) {
        // 创建集合对象
        HashSet<String> hashSet = new HashSet<>();

        // 添加元素
        hashSet.add("hello");
        hashSet.add("world");
        hashSet.add("java");在TeamViewer被封禁
        
        hashSet.add("world");

        // 遍历集合
        for (String s : hashSet) {
            System.out.println(s);
        }

    }
}

在上面的程序中，添加了两次“world”元素，但是最终遍历集合仅输出了3个元素：

world
java
hello

这就说明第二次重复添加时并没有进行实际添加操作，俗称加了个寂寞~

正是这一操作确保了集合的元素唯一性，对此进行源码分析：

// 创建集合对象
        HashSet<String> hashSet = new HashSet<>();

        // 添加元素
        hashSet.add("hello");
        hashSet.add("world");
        hashSet.add("java");

添加元素的.add();方法是调用hashSet对象的，而hashSet对象是HashSet这个集合的，所以要去HashSet的源码中去找.add();方法在HashSet.class中的源码（选中HashSet后Ctrl+B跟进）：

    public boolean add(E e) {
        return this.map.put(e, PRESENT) == null;
    }

.add();方法的内部调用了.put()方法，同样的方法找到了.put()方法在HashMap.class中的源码：

    public V put(K key, V value) {
        return this.putVal(hash(key), key, value, false, true);
    }

而.put()方法的参数调用了一个.hash()方法，再次跟进找到了它在HashMap.class中的源码：

    static final int hash(Object key) {
        int h;
        return key == null ? 0 : (h = key.hashCode()) ^ h >>> 16;
    }

拿到.put()方法需要的int类型参数后，调用了.putVal()方法，再跟进：

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
        HashMap.Node[] tab;
        int n;
        if ((tab = this.table) == null || (n = tab.length) == 0) {
            n = (tab = this.resize()).length;
        }

        Object p;
        int i;
        if ((p = tab[i = n - 1 & hash]) == null) {
            tab[i] = this.newNode(hash, key, value, (HashMap.Node)null);
        } else {
            Object e;
            Object k;
            if (((HashMap.Node)p).hash == hash && ((k = ((HashMap.Node)p).key) == key || key != null && key.equals(k))) {
                e = p;
            } else if (p instanceof HashMap.TreeNode) {
                e = ((HashMap.TreeNode)p).putTreeVal(this, tab, hash, key, value);
            } else {
                int binCount = 0;

                while(true) {
                    if ((e = ((HashMap.Node)p).next) == null) {
                        ((HashMap.Node)p).next = this.newNode(hash, key, value, (HashMap.Node)null);
                        if (binCount >= 7) {
                            this.treeifyBin(tab, hash);
                        }
                        break;
                    }

                    if (((HashMap.Node)e).hash == hash && ((k = ((HashMap.Node)e).key) == key || key != null && key.equals(k))) {
                        break;
                    }

                    p = e;
                    ++binCount;
                }
            }

            if (e != null) {
                V oldValue = ((HashMap.Node)e).value;
                if (!onlyIfAbsent || oldValue == null) {
                    ((HashMap.Node)e).value = value;
                }

                this.afterNodeAccess((HashMap.Node)e);
                return oldValue;
            }
        }

        ++this.modCount;
        if (++this.size > this.threshold) {
            this.resize();
        }

        this.afterNodeInsertion(evict);
        return null;
    }

以上便是找源码的流程，下面进行分析：

第二次使用.add()方法添加“world”元素失败，因此直接看第二次添加的流程：

首先是以“world”字符串为参数调用在HashSet.java中的public boolean add(E e)方法，然后因为.add()方法调用了HashMap.java中的.put()方法，所以球又被踢给了public V put(K key, V value)
在.put()方法中，它再度将球踢给了HashMap.java中的.putVal()方法，再看参数，它还以“world”字符串为参数调用了static final int hash(Object key)，hash()方法最终返回的是该参数的哈希值。
这回，.putVal()方法的参数齐了，所以我们来一步一步看这个方法的流程。
1. 第一行HashMap.Node[] tab;声明了一个元素为结点的数组，这是哈希表结构的一种实现。
2. 再往后是一个if语句，条件为：(tab = this.table) == null || (n = tab.length) == 0，该条件用于判断哈希表是否初始化，没有初始化的话就执行内部的初始化语句初始化哈希表。
3. 接下来又是一个if语句，这个语句的条件为：(p = tab[i = n - 1 & hash]) == null，这个条件用于根据对象的哈希值计算对象的存储位置，从而判断该位置有没有元素，如果为null就执行语句存储元素
4. 但是如果该位置有元素呢，所以下面还有一个else语句，else也有一个很长的if判断：((HashMap.Node)p).hash == hash && ((k = ((HashMap.Node)p).key) == key || key != null && key.equals(k))，它是由逻辑与&&连起来的，所以我们把它分成俩部分看：
  - ((HashMap.Node)p).hash == hash：比较待存入元素和已存在元素的哈希值，由于逻辑与的特性，一假全假，所以如果哈希值不同就会继续向下执行这个if语句的else语句，将元素添加到集合中; 如果相同则继续进行下面的判断。
  - ((k = ((HashMap.Node)p).key) == key || key != null && key.equals(k))：判断对象是否相同，以及是否为null，重点是后面的.equal()方法。如果这个判断得到false，就也会继续向下执行这个if语句的else语句，将元素添加到集合中; 如果得到true，说明元素已存在，即重复，不进行存储。

或许结合流程图看分析会更清晰？

对此进行一个小结：
它是根据hashCode()和equals()最终确定元素是否重复的

HashSet 集合存储元素：
要保证元素唯一性，需要重写hashCode()和equals()

OK,溜了溜了，下班咯～

相关标签： Java初试到搁浅，再从搁浅到完善编程语言 java

上一篇：重写hashCode()方法的简略方案

下一篇：转载：对几个通用的Java hashCode重写方案的一些思考和探讨

HashSet集合保证元素唯一性原理分析

HashSet集合保证元素唯一性原理分析

HashSet集合保证元素唯一性的源码分析

HashSet集合保证元素唯一性的源码分析

HashSet如何保证元素唯一性