HashMap源码阅读分享
目录
本文贴出代码基于java1.8
- HashMap成员说明
- 构造及下标计算过程
- HashMap put流程
- HashMap remove流程
- HashMap get流程
- 其余常用方法
- 红黑树理论知识
- 其它数据结构简述
- 红黑树在HashMap中的实现
- jdk1.7扩容时的环形链表
基础知识
jdk1.7
- HashMap使用数组+链表结构存放数据;非线程安全;链表采用头插法。
- HashTable使用数组+链表结构存放数据;线程安全;链表采用头插法。
jdk1.8
- HashMap使用数组+链表、红黑树结构存放数据;链表采用尾插法;当插入元素使链表长度大于等于 8 时,如果元素数量小于 64 直接扩容,否则便会将链表变成红黑树。后续扩容出现红黑树结点数小于 6 时,又会将红黑树转成链表。
- HashTable使用数组+链表结构存放数据;线程安全;链表采用头插法。
源码交流
- 阅读源码我一般会先看成员及注释说明,构造方法,然后找到源码中开放出来的常用方法追溯进去(英文好的朋友可以先看类说明)
- 以下贴出的代码尽量都带有完整说明,希望英文好的可以直接看代码注释,应该收获会更多。
HashMap成员说明
下图是HashMap的成员变量概览(jdk1.8)
- HashMap默认初始化容量:16
/** * The default initial capacity - MUST be a power of two. */ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
- HashMap数组最大长度:1<<30 == 1073741824 ;java int 能表示的最大正整数 (1<<31)-1
/** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two <= 1<<30. */ static final int MAXIMUM_CAPACITY = 1 << 30;
- 默认加载因子: 0.75 ;容量*加载因子=扩容阈值
/** * The load factor used when none specified in constructor. */ static final float DEFAULT_LOAD_FACTOR = 0.75f;
- 链表 树型化阈值:8
/** * The bin count threshold for using a tree rather than list for a * bin. Bins are converted to trees when adding an element to a * bin with at least this many nodes. The value must be greater * than 2 and should be at least 8 to mesh with assumptions in * tree removal about conversion back to plain bins upon * shrinkage. */ static final int TREEIFY_THRESHOLD = 8;
- 树 链表化阈值:6
/** * The bin count threshold for untreeifying a (split) bin during a * resize operation. Should be less than TREEIFY_THRESHOLD, and at * most 6 to mesh with shrinkage detection under removal. */ static final int UNTREEIFY_THRESHOLD = 6;
- 允许链表树化的最小阈值:64;如果size<64,链表长度>=8时直接扩容
/** * The smallest table capacity for which bins may be treeified. * (Otherwise the table is resized if too many nodes in a bin.) * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts * between resizing and treeification thresholds. */ static final int MIN_TREEIFY_CAPACITY = 64;
- 元素数组
/** * The table, initialized on first use, and resized as * necessary. When allocated, length is always a power of two. * (We also tolerate length zero in some operations to allow * bootstrapping mechanics that are currently not needed.) */ transient Node<K,V>[] table;
- entrySet :迭代
/** * Holds cached entrySet(). Note that AbstractMap fields are used * for keySet() and values(). */ transient Set<Map.Entry<K,V>> entrySet;
- size
/** * The number of key-value mappings contained in this map. */ transient int size;
- 记录HashMap修改次数:在迭代器迭代过程中,如果有其它线程修改HashMap结构(key变化),会抛出ConcurrentModificationException,fail-fast.
/** * The number of times this HashMap has been structurally modified * Structural modifications are those that change the number of mappings in * the HashMap or otherwise modify its internal structure (e.g., * rehash). This field is used to make iterators on Collection-views of * the HashMap fail-fast. (See ConcurrentModificationException). */ transient int modCount;
- 扩容阈值
/** * The next size value at which to resize (capacity * load factor). * * @serial */ // (The javadoc description is true upon serialization. // Additionally, if the table array has not been allocated, this // field holds the initial array capacity, or zero signifying // DEFAULT_INITIAL_CAPACITY.) int threshold;
- 加载因子
/** * The load factor for the hash table. * * @serial */ final float loadFactor;
- Node 内部类(链表节点,维护一条单链表关系)
/** * Basic hash bin node, used for most entries. (See below for * TreeNode subclass, and in LinkedHashMap for its Entry subclass.) */ static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; Node(int hash, K key, V value, Node<K,V> next) { this.hash = hash; this.key = key; this.value = value; this.next = next; } public final K getKey() { return key; } public final V getValue() { return value; } public final String toString() { return key + "=" + value; } public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); } public final V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } public final boolean equals(Object o) { if (o == this) return true; if (o instanceof Map.Entry) { Map.Entry<?,?> e = (Map.Entry<?,?>)o; if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) return true; } return false; } }
- TreeNode 内部类 继承自LinkedHashMap.Entry(又继承自Hashmap.Node)
// TreeNode代码太多,只贴了成员和构造,这样的数据结构,使得TreeNode不仅维护了红黑树的数据结构,还额外维护了一条双向链表 // 在整个HashMap树节点的删除和插入过程里,节点之间不仅维护了红黑树父子层级关系,还始终维持了节点间的前后顺序(双向链表) /** * Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn * extends Node) so can be used as extension of either regular or * linked node. */ static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> { TreeNode<K,V> parent; // red-black tree links TreeNode<K,V> left; TreeNode<K,V> right; TreeNode<K,V> prev; // needed to unlink next upon deletion boolean red; TreeNode(int hash, K key, V val, Node<K,V> next) { super(hash, key, val, next); } }
构造方法
-
构造方法概览
-
无参构造:建议使用初始化容量构造
/** * Constructs an empty <tt>HashMap</tt> with the default initial capacity * (16) and the default load factor (0.75). */ public HashMap() { this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted }
-
初始化容量构造:容量设置公式: initialCapacity = goalSize/loadFactor +1
/** * Constructs an empty <tt>HashMap</tt> with the specified initial * capacity and the default load factor (0.75). * * @param initialCapacity the initial capacity. * @throws IllegalArgumentException if the initial capacity is negative. */ public HashMap(int initialCapacity) { this(initialCapacity, DEFAULT_LOAD_FACTOR); }
-
传入HashMap构造:相当于调用putAll方法
/** * Constructs a new <tt>HashMap</tt> with the same mappings as the * specified <tt>Map</tt>. The <tt>HashMap</tt> is created with * default load factor (0.75) and an initial capacity sufficient to * hold the mappings in the specified <tt>Map</tt>. * * @param m the map whose mappings are to be placed in this map * @throws NullPointerException if the specified map is null */ public HashMap(Map<? extends K, ? extends V> m) { this.loadFactor = DEFAULT_LOAD_FACTOR; putMapEntries(m, false); }
下面贴一下putMapEntries的代码,本质就是循环put,贴它的原因在于它建议了 初始化容量 的公式
/** * Implements Map.putAll and Map constructor. * * @param m the map * @param evict false when initially constructing this map, else * true (relayed to method afterNodeInsertion). */ final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) { int s = m.size(); if (s > 0) { if (table == null) { // pre-size //建议容量初始化设置在下一行,看过来,看过来! float ft = ((float)s / loadFactor) + 1.0F; int t = ((ft < (float)MAXIMUM_CAPACITY) ? (int)ft : MAXIMUM_CAPACITY); if (t > threshold) threshold = tableSizeFor(t); } else if (s > threshold) resize(); for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) { K key = e.getKey(); V value = e.getValue(); putVal(hash(key), key, value, false, evict); } } }
-
初始化容量及加载因子构造:如无特殊需求,建议使用默认加载因子
/** * Constructs an empty <tt>HashMap</tt> with the specified initial * capacity and load factor. * * @param initialCapacity the initial capacity * @param loadFactor the load factor * @throws IllegalArgumentException if the initial capacity is negative * or the load factor is nonpositive */ public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); }
我们来看看这个构造,逻辑很简单,就是限制: 0 <= initialCapacity <= 1<<30
此处重点在于tableSizeFor(int cap)方法。
此方法就是HashMap的数组长度始终是大于等于传入initialCapacity 的最小 2^n 的原因(注意此处是数组长度,而不是允许插入的元素数量)/** * Returns a power of two size for the given target capacity. */ static final int tableSizeFor(int cap) { int n = cap - 1; n |= n >>> 1; n |= n >>> 2; n |= n >>> 4; n |= n >>> 8; n |= n >>> 16; return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; }
上图代码中,n依次与n无符号右移的结果做 |= 操作,是为了得到n最高‘1’位之后全为1的数。 做cap-1操作,是为了应对传入数字本来是2^n的情况。不减1计算后会得到2^(n+1) 在计算机中,整数是以补码的形式参与计算的。 我们以cap分别为 0,1,17,1<<30 来看看,n的值就分别是 -1,0,16,(1<<30)-1 。 -1 与任意32位二进制数异或的结果都是 -1 即 n == -1 补码:11111111111111111111111111111111 return 1 —————————————————————————————————————————————————————————————————————————————————————— 0 明显五次运算之后还是下面的二进制数 即 n == 0 补码:00000000000000000000000000000000 return 1 —————————————————————————————————————————————————————————————————————————————————————— 16 补码:00000000000000000000000000010000 n|=n>>>1 补码:00000000000000000000000000010000 补码:00000000000000000000000000001000 -> 0000000000000000000000011000 n|=n>>>2 补码:00000000000000000000000000011000 补码:00000000000000000000000000000110 -> 0000000000000000000000011110 n|=n>>>4 结果再+1,就是32 补码:00000000000000000000000000011110 补码:00000000000000000000000000000001 -> 0000000000000000000000011111 —————————————————————————————————————————————————————————————————————————————————————— (1<<30)-1 已经无需再模拟流程了 补码:00111111111111111111111111111111 所以传入允许范围内的数,都可以获得大于等于该数的最小2^n的结果。
下标计算过程
- 我们知道,要将一个元素映射到hash表中,只需 h % n,n为数组长度,h为hash值。 HashMap却用了 (n - 1) & hash 来计算下标。我们来看下面一个简单例子。
默认初始长度16,假设hash值为21 ————————————————————————————————————————————————————————— (n - 1) & hash 15:...00001111 21:...00010101 5:...00000101 ————————————————————————————————————————————————————————— h % n 21 % 16 = 5
- 上面的例子存在一个公式: h % 2^n = h & ((2^n) – 1) ;这也是HashMap限制数组长度必须是2^n的原因,至于为什么要将 求模 换成 位运算 运算,网上都提到了 位运算 执行速度比 求模 快,且此处的hash函数返回的是一个int值,范围在-2147483648到2147483647,做求模运算前还要做一次转正处理,我认为这些都是正确的结论。
- 不过这些结论以及代码优化的过程,都是在随着Java不断升级,hash函数的不断优化,优秀的Java开发者不断的重构的一个综合选择过程。在写这儿之前,我去阅读了java2/3/4/5/6/7/8/12的HashMap和HashTable计算下标的源码。我们可以看看下面的截图,版本是j2se1.3(1.2也差不多)
- 可以看到,在java1.3中HashMap同HashTable的构造方法完全一样。实际上它们的下标计算、扩容、甚至整个类的结构,除了线程安全性差异以及各自微小的特性差之外,几乎都是一样的。
java1.3中,HashMap以及HashTable计算下标的方式(转正求模):(hash & 0x7FFFFFFF) % tab.length
初始化数组大小都为11。扩容方式2n+1
至于为什么使用素数做hash表的长度,大家可以看看下面的链接
https://blog.csdn.net/zhishengqianjun/article/details/79087525#23-检验
- HashMap是在1.4版本开始使用 位运算 代替 求模 的,从1.4开始,Doug Lea 出现在了@author中
// 1.4中,获取2^n长度的代码 // Find a power of 2 >= initialCapacity int capacity = 1; //此处如果初始化size稍大,执行位运算的次数就会超过1.8版本的实现方式,2^6 => 32时就会执行五次。 while (capacity < initialCapacity) capacity <<= 1; //1.4中对hash值的处理,为了避免低效的hash函数带来高碰撞,得到hash值后单独做了多次位运算 //此处没太懂写成下面这样的形式的原因 //(各个位段的特征值均匀处理,我理解的应该就弄成1.5后的新补充hash处理就行了,没懂+=、~有什么目的性) static int hash(Object x) { int h = x.hashCode(); h += ~(h << 9); h ^= (h >>> 14); h += (h << 4); h ^= (h >>> 10); return h; }
- jdk1.5开始,就已经提供了新的补充hash算法,但为了兼容,默认使用旧方法,可以在jvm参数中设置使用新方法
/** * Whether to prefer the old supplemental hash function, for * compatibility with broken applications that rely on the * internal hashing order. * * Set to true only by hotspot when invoked via * -XX:+UseNewHashFunction or -XX:+AggressiveOpts */ private static final boolean useNewHash; static { useNewHash = false; } private static int oldHash(int h) { h += ~(h << 9); h ^= (h >>> 14); h += (h << 4); h ^= (h >>> 10); return h; } // 新方法感觉就要好理解些,尽量将32位hash值的特征,都结合一下(防御机制) private static int newHash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }
- jdk1.6 直接采用新的补充hash算法
/** * Applies a supplemental hash function to a given hashCode, which * defends against poor quality hash functions. This is critical * because HashMap uses power-of-two length hash tables, that * otherwise encounter collisions for hashCodes that do not differ * in lower bits. Note: Null keys always map to hash 0, thus index 0. */ static int hash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }
- jdk1.7中的2^n计算、hash处理
//2^n计算 private static int roundUpToPowerOf2(int number) { // assert number >= 0 : "number must be non-negative"; return number >= MAXIMUM_CAPACITY ? MAXIMUM_CAPACITY : (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1; } public static int highestOneBit(int i) { // HD, Figure 3-1 i |= (i >> 1); i |= (i >> 2); i |= (i >> 4); i |= (i >> 8); i |= (i >> 16); return i - (i >>> 1); } //hash处理 /** * Retrieve object hash code and applies a supplemental hash function to the * result hash, which defends against poor quality hash functions. This is * critical because HashMap uses power-of-two length hash tables, that * otherwise encounter collisions for hashCodes that do not differ * in lower bits. Note: Null keys always map to hash 0, thus index 0. */ final int hash(Object k) { int h = hashSeed; // hashSeed不为0且key为String型时,采用另一种hash算法 if (0 != h && k instanceof String) { return sun.misc.Hashing.stringHash32((String) k); } h ^= k.hashCode(); // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }
- 到jdk1.8,2n的计算就演变成我们上文中分析过的那样。对Hash值的处理,也只是将高16位与低16位做了一次异或运算。因为多数使用情况是低位hash值的分布均匀性决定了碰撞的多少,所以通过^操作将高16的特征传递给低位。这儿不像以前版本那样做多次位运算,因为引入了红黑树,即使链表过长,查询效率也提高了很多。
/** * Computes key.hashCode() and spreads (XORs) higher bits of hash * to lower. Because the table uses power-of-two masking, sets of * hashes that vary only in bits above the current mask will * always collide. (Among known examples are sets of Float keys * holding consecutive whole numbers in small tables.) So we * apply a transform that spreads the impact of higher bits * downward. There is a tradeoff between speed, utility, and * quality of bit-spreading. Because many common sets of hashes * are already reasonably distributed (so don't benefit from * spreading), and because we use trees to handle large sets of * collisions in bins, we just XOR some shifted bits in the * cheapest possible way to reduce systematic lossage, as well as * to incorporate impact of the highest bits that would otherwise * never be used in index calculations because of table bounds. */ static final int hash(Object key) { int h; // key 可为null原因 return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
- jdk12中2^n变了,对hash的处理没变。关于1.8之前版本,为什么需要对hash补充做多次位运算,以及将下标计算方式从 求模 改为 位运算 ,会快很多吗?我找到一篇资料,里面有一段该类的作者之一Joshua Bloch的回答,基于的代码版本有点老了。大致说hash表长度采用2^n后,对hash函数的敏感度很高,对hash结果的多次位运算是一种防御机制。然后求模和位运算性能,上面列举了一些基础测试数据。
HashMap requires a better hashCode() - JDK 1.4 Part II;还有一篇资料提到,1.4版本的整数求模比之前版本慢了许多(属于1.4版本的bug),为了解决这一问题(目的驱使),故从1.4开始做了这一变化,HashMap Interview Questions//----jdk12----- // numberOfLeadingZeros求得传入数字最高位为0的位数 // 然后将-1无符号右移动对应的位数 // -1 补码:11111111111111111111111111111111 /** * Returns a power of two size for the given target capacity. */ static final int tableSizeFor(int cap) { int n = -1 >>> Integer.numberOfLeadingZeros(cap - 1); return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; } /** * Returns the number of zero bits preceding the highest-order * ("leftmost") one-bit in the two's complement binary representation * of the specified {@code int} value. Returns 32 if the * specified value has no one-bits in its two's complement representation, * in other words if it is equal to zero. * * <p>Note that this method is closely related to the logarithm base 2. * For all positive {@code int} values x: * <ul> * <li>floor(log<sub>2</sub>(x)) = {@code 31 - numberOfLeadingZeros(x)} * <li>ceil(log<sub>2</sub>(x)) = {@code 32 - numberOfLeadingZeros(x - 1)} * </ul> * * @param i the value whose number of leading zeros is to be computed * @return the number of zero bits preceding the highest-order * ("leftmost") one-bit in the two's complement binary representation * of the specified {@code int} value, or 32 if the value * is equal to zero. * @since 1.5 */ @HotSpotIntrinsicCandidate public static int numberOfLeadingZeros(int i) { // HD, Count leading 0's if (i <= 0) return i == 0 ? 32 : 0; int n = 31; if (i >= 1 << 16) { n -= 16; i >>>= 16; } if (i >= 1 << 8) { n -= 8; i >>>= 8; } if (i >= 1 << 4) { n -= 4; i >>>= 4; } if (i >= 1 << 2) { n -= 2; i >>>= 2; } return n - (i >>> 1); }
HashMap put流程
-
我们看下面的put()方法,实际使用的是putVal方法
/** * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for the key, the old * value is replaced. * * @param key key with which the specified value is to be associated * @param value value to be associated with the specified key * @return the previous value associated with <tt>key</tt>, or * <tt>null</tt> if there was no mapping for <tt>key</tt>. * (A <tt>null</tt> return can also indicate that the map * previously associated <tt>null</tt> with <tt>key</tt>.) */ public V put(K key, V value) { return putVal(hash(key), key, value, false, true); } //hash(key)方法我们在上面1.8处说明已经讲过
-
putVal方法,我直接加注释说明
/** * Implements Map.put and related methods. * * @param hash hash for key * @param key the key * @param value the value to put * @param onlyIfAbsent if true, don't change existing value * @param evict if false, the table is in creation mode. * @return previous value, or null if none */ //onlyIfAbsent存在就不插入;evict在HashMap中未使用,在LinkedHashMap中使用(访问顺序和插入顺序控制) final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0) //如果当前表还没有元素,扩容(此处条件产生的扩容主要是初始化元素数组,后续会单独讲整个扩容流程) n = (tab = resize()).length; if ((p = tab[i = (n - 1) & hash]) == null) //数组对应下标位置是否已有元素存在,没有直接新建数据 tab[i] = newNode(hash, key, value, null); else {//数组对应位置已经有数据存在,即产生了碰撞 Node<K,V> e; K k; if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) //已存在相同的hash、key的元素,且为根结点(头结点),将e指向该结点 e = p; else if (p instanceof TreeNode)//该元素为树类型 //按照 TreeNode 的规则插入(树规则插入后续会单独讲,此处只需知道返回了旧结点数据) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); else {//该元素为链表类型 for (int binCount = 0; ; ++binCount) {//遍历链表 if ((e = p.next) == null) { //遍历完链表未找到hash、key都相等的结点,尾插入结点 p.next = newNode(hash, key, value, null); //判断插入新结点后是否需要树化 if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st //树化 treeifyBin(tab, hash); break; } //找到已存在key结点,直接退出循环,当前e指向同key结点 if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; } } //e不为空,已经有同key结点存在,根据onlyIfAbsent决定是否要覆盖旧值 if (e != null) { // existing mapping for key V oldValue = e.value; if (!onlyIfAbsent || oldValue == null) e.value = value; //该方法在HashMap中并无实现,用于LinkedHashMap实现按访问顺序排序 afterNodeAccess(e); //存在同key结点,put操作不会改变HashMap的key结构,modCount无需记录,也无需扩容 return oldValue; } } ++modCount;//记录修改次数fail-fast if (++size > threshold) //超过扩容阈值,扩容 resize(); //该访问在HashMap中并无实现,用于LinkedHashMap实现按插入顺序排序 afterNodeInsertion(evict); return null; }
-
treeifyBin
/** * Replaces all linked nodes in bin at index for given hash unless * table is too small, in which case resizes instead. */ final void treeifyBin(Node<K,V>[] tab, int hash) { int n, index; Node<K,V> e; // 如果元素数量小于 最小树化值(64),直接扩容 if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) resize(); else if ((e = tab[index = (n - 1) & hash]) != null) { TreeNode<K,V> hd = null, tl = null; // 将 Node链表 转成 TreeNode 链表 do { TreeNode<K,V> p = replacementTreeNode(e, null); if (tl == null) hd = p; else { p.prev = tl; tl.next = p; } tl = p; } while ((e = e.next) != null); if ((tab[index] = hd) != null) hd.treeify(tab);// 将 TreeNode链表 构造成红黑树 } }
-
treeify
/** * Forms tree of the nodes linked from this node. * @return root of tree */ final void treeify(Node<K,V>[] tab) { TreeNode<K,V> root = null; // 遍历TreeNode链表 for (TreeNode<K,V> x = this, next; x != null; x = next) { next = (TreeNode<K,V>)x.next; x.left = x.right = null; if (root == null) {// 如果根为空,则头节点为根节点 x.parent = null; x.red = false; root = x; } else { // 已存在根节点,则按照二叉查找树的插入规则,依次比较,定位插入节点的位置 K k = x.key; int h = x.hash; Class<?> kc = null; for (TreeNode<K,V> p = root;;) { int dir, ph; K pk = p.key; if ((ph = p.hash) > h) dir = -1; else if (ph < h) dir = 1; // 如果hash值未比较出大小,判断key是否继承自Comparable(可比较性),满足条件,再次比较 else if ((kc == null && (kc = comparableClassFor(k)) == null) || (dir = compareComparables(kc, k, pk)) == 0) // 如果上述规则仍旧未比较出大小,此方法直接比较key所属类类名,且一定会返回不为0的结果 dir = tieBreakOrder(k, pk); TreeNode<K,V> xp = p; if ((p = (dir <= 0) ? p.left : p.right) == null) {// 找到插入位置(叶子节点位) x.parent = xp; if (dir <= 0) xp.left = x; else xp.right = x; root = balanceInsertion(root, x); //插入后恢复平衡 break; } } } } moveRootToFront(tab, root);// 保证根节点在数组位置,且在双向链表头部 }
-
扩容
/** * Initializes or doubles table size. If null, allocates in * accord with initial capacity target held in field threshold. * Otherwise, because we are using power-of-two expansion, the * elements from each bin must either stay at same index, or move * with a power of two offset in the new table. * * @return the table */ final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; if (oldCap > 0) { //扩容前有元素,且table容量已经是最大限制容量,将扩容阈值设置为Integer.MAX_VALUE。返回旧表 if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } //新容量=2*oldCap,if 新容量 < 最大限制容量 && 旧容量大于等于默认容量,newThr = oldThr*2 else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } //无元素,且旧扩容阈值大于0,容量=旧扩容阈值(使用‘初始化容量构造’,第一次put走该流程) else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; //容量及扩容阈值都未初始化 else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } //以上规则执行后,新扩容阈值未赋值(存在元素,扩容后达到最大限制容量/使用‘初始化容量构造’,第一次put走该流程) if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; // 创建新数组 @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; //遍历旧表元素,插入新表中 if (oldTab != null) { for (int j = 0; j < oldCap; ++j) { Node<K,V> e; //拿到当前不为null的结点 if ((e = oldTab[j]) != null) { oldTab[j] = null;//将旧表对应位置结点置为null if (e.next == null) //当前位置结点为单节点,直接计算下标放入新表, newTab[e.hash & (newCap - 1)] = e; else if (e instanceof TreeNode) //当前下标位置为树形结点,调用树形结点高低位分割方法 ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); else { // preserve order //如果当前结点为链表结点,分高低位两条新链表存放元素 Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; /** * 低位链表 * 如果元素hash值同旧容量长度&操作结果为0,说明扩容增加的参与下标计算的二进制位,不会影响再次hash的结果 * 例:计算下标00001111,实际容量表示00010000,扩容后计算下标00011111 */ if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } // 高位链表 // 如果结果不为0,说明扩展位影响到了hash的计算结果,并且结果一定是旧值+旧容量 else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null); // 低位链表不为空,插入原下标位置 if (loTail != null) { loTail.next = null; newTab[j] = loHead; } // 高位链表不为空,插入原下标+原容量位置 if (hiTail != null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }
-
((TreeNode<K,V>)e).split(this, newTab, j, oldCap)
/** * 树形结点高低位分割方法 * 处理逻辑本质上同链表方式一样 */ /** * Splits nodes in a tree bin into lower and upper tree bins, * or untreeifies if now too small. Called only from resize; * see above discussion about split bits and indices. * * @param map the map * @param tab the table for recording bin heads * @param index the index of the table being split * @param bit the bit of hash to split on */ final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) { TreeNode<K,V> b = this; // Relink into lo and hi lists, preserving order TreeNode<K,V> loHead = null, loTail = null; TreeNode<K,V> hiHead = null, hiTail = null; int lc = 0, hc = 0; // 高低位元素计数,判断转树还是转链表结构 for (TreeNode<K,V> e = b, next; e != null; e = next) { next = (TreeNode<K,V>)e.next; e.next = null; if ((e.hash & bit) == 0) { // lower if ((e.prev = loTail) == null) loHead = e; else loTail.next = e; loTail = e; ++lc; } else { // higher if ((e.prev = hiTail) == null) hiHead = e; else hiTail.next = e; hiTail = e; ++hc; } } if (loHead != null) { if (lc <= UNTREEIFY_THRESHOLD) tab[index] = loHead.untreeify(map); // 元素小于等于6,将TreeNode 转成 Node,返回头结点 else { tab[index] = loHead; if (hiHead != null) // (else is already treeified) loHead.treeify(tab); //将 TreeNode链 树化 } } if (hiHead != null) { if (hc <= UNTREEIFY_THRESHOLD) tab[index + bit] = hiHead.untreeify(map); // 元素小于等于6,将TreeNode 转成 Node,返回头结点 else { tab[index + bit] = hiHead; if (loHead != null) hiHead.treeify(tab); // 将 TreeNode链 树化 } } }
-
untreeify
// 将 TreeNode链表 转成 Node链表 /** * Returns a list of non-TreeNodes replacing those linked from * this node. */ final Node<K,V> untreeify(HashMap<K,V> map) { Node<K,V> hd = null, tl = null; for (Node<K,V> q = this; q != null; q = q.next) { Node<K,V> p = map.replacementNode(q, null); if (tl == null) hd = p; else tl.next = p; tl = p; } return hd; }
-
HashMap remove 流程
-
实际上使用的是 removeNode 方法
/** * Removes the mapping for the specified key from this map if present. * * @param key key whose mapping is to be removed from the map * @return the previous value associated with <tt>key</tt>, or * <tt>null</tt> if there was no mapping for <tt>key</tt>. * (A <tt>null</tt> return can also indicate that the map * previously associated <tt>null</tt> with <tt>key</tt>.) */ public V remove(Object key) { Node<K,V> e; return (e = removeNode(hash(key), key, null, false, true)) == null ? null : e.value; }
-
removeNode 方法
/** * Implements Map.remove and related methods * * @param hash hash for key * @param key the key * @param value the value to match if matchValue, else ignored * @param matchValue if true only remove if value is equal * @param movable if false do not move other nodes while removing * @return the node, or null if none */ final Node<K,V> removeNode(int hash, Object key, Object value, boolean matchValue, boolean movable) { Node<K,V>[] tab; Node<K,V> p; int n, index; if ((tab = table) != null && (n = tab.length) > 0 && (p = tab[index = (n - 1) & hash]) != null) {// 定位当前结点 Node<K,V> node = null, e; K k; V v; if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) node = p; // 删除结点为第一个结点 else if ((e = p.next) != null) { if (p instanceof TreeNode) node = ((TreeNode<K,V>)p).getTreeNode(hash, key); // 按 TreeNode 方式查找删除结点 else { // 按链表结构查找删除结点 do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) { node = e; break; } p = e; } while ((e = e.next) != null); } } // 查到删除结点,满足值匹配条件 if (node != null && (!matchValue || (v = node.value) == value || (value != null && value.equals(v)))) { if (node instanceof TreeNode) // 树形结点,按 TreeNode 方式删除结点 ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable); else if (node == p) // 第一个结点即为删除结点,下标指向删除结点next结点 tab[index] = node.next; else // 删除结点在链表中,上一节点指向删除结点的next结点 p.next = node.next; ++modCount; // 记录结构变化 --size; // size变化 afterNodeRemoval(node); // 删除结点回调(LinkedHashMap 中删除结点后维护双向链表关系) return node; } } return null; }
-
HashMap get 流程
- get方法
/** * Returns the value to which the specified key is mapped, * or {@code null} if this map contains no mapping for the key. * * <p>More formally, if this map contains a mapping from a key * {@code k} to a value {@code v} such that {@code (key==null ? k==null : * key.equals(k))}, then this method returns {@code v}; otherwise * it returns {@code null}. (There can be at most one such mapping.) * * <p>A return value of {@code null} does not <i>necessarily</i> * indicate that the map contains no mapping for the key; it's also * possible that the map explicitly maps the key to {@code null}. * The {@link #containsKey containsKey} operation may be used to * distinguish these two cases. * * @see #put(Object, Object) */ public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; }
- getNode
/** * Implements Map.get and related methods * * @param hash hash for key * @param key the key * @return the node, or null if none */ final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) { if (first.hash == hash && // always check first node ((k = first.key) == key || (key != null && key.equals(k)))) return first; // 下标结点即为查找结点 if ((e = first.next) != null) { if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); // TreeNode 结点查找方式 do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; // Node 结点,从链表中查找结点 } while ((e = e.next) != null); } } return null; }
-
常用方法
- size()
/** * Returns the number of key-value mappings in this map. * * @return the number of key-value mappings in this map */ public int size() { return size; }
- isEmpty()
/** * Returns <tt>true</tt> if this map contains no key-value mappings. * * @return <tt>true</tt> if this map contains no key-value mappings */ public boolean isEmpty() { return size == 0; }
- containsKey(Object key)
/** * Returns <tt>true</tt> if this map contains a mapping for the * specified key. * * @param key The key whose presence in this map is to be tested * @return <tt>true</tt> if this map contains a mapping for the specified * key. */ public boolean containsKey(Object key) { return getNode(hash(key), key) != null; }
- clear()
/** * Removes all of the mappings from this map. * The map will be empty after this call returns. */ public void clear() { Node<K,V>[] tab; modCount++; if ((tab = table) != null && size > 0) { size = 0; for (int i = 0; i < tab.length; ++i) tab[i] = null; } }
- containsValue(Object value) 链表结构遍历元素(红黑树维护了双向链表)
/** * Returns <tt>true</tt> if this map maps one or more keys to the * specified value. * * @param value value whose presence in this map is to be tested * @return <tt>true</tt> if this map maps one or more keys to the * specified value */ public boolean containsValue(Object value) { Node<K,V>[] tab; V v; if ((tab = table) != null && size > 0) { for (int i = 0; i < tab.length; ++i) { for (Node<K,V> e = tab[i]; e != null; e = e.next) { if ((v = e.value) == value || (value != null && value.equals(v))) return true; } } } return false; }
- entrySet() 通过HashIterator实现
/** * Returns a {@link Set} view of the mappings contained in this map. * The set is backed by the map, so changes to the map are * reflected in the set, and vice-versa. If the map is modified * while an iteration over the set is in progress (except through * the iterator's own <tt>remove</tt> operation, or through the * <tt>setValue</tt> operation on a map entry returned by the * iterator) the results of the iteration are undefined. The set * supports element removal, which removes the corresponding * mapping from the map, via the <tt>Iterator.remove</tt>, * <tt>Set.remove</tt>, <tt>removeAll</tt>, <tt>retainAll</tt> and * <tt>clear</tt> operations. It does not support the * <tt>add</tt> or <tt>addAll</tt> operations. * * @return a set view of the mappings contained in this map */ public Set<Map.Entry<K,V>> entrySet() { Set<Map.Entry<K,V>> es; return (es = entrySet) == null ? (entrySet = new EntrySet()) : es; }
- forEach() 链表结构遍历元素(红黑树维护了双向链表)
public void forEach(BiConsumer<? super K, ? super V> action) { Node<K,V>[] tab; if (action == null) throw new NullPointerException(); if (size > 0 && (tab = table) != null) { int mc = modCount; for (int i = 0; i < tab.length; ++i) { for (Node<K,V> e = tab[i]; e != null; e = e.next) action.accept(e.key, e.value); } if (modCount != mc) throw new ConcurrentModificationException(); } }
关于红黑树的说明,copy自*:RED-BLACK TREE,插入和删除采用c示例代码说明
- 红黑树性质
- 节点是红色或黑色
- 根节点是黑色
- 所有叶子都是黑色(叶子是NIL节点)
- 从每个叶子到根的所有路径上不能有两个连续的红色节点
- 从任一节点到其每个叶子的所有简单路径都包含相同数目的黑色节点
- 基础知识
- 自平衡二叉查找树
- 由于性质4、5的存在,保证了最长路径不大于最短路径的两倍
- 查找、删除、插入,都能保证O(log n)时间内完成
- 插入和删除破坏红黑树平衡性质后,通过 树旋转(不多于三次) 和 颜色变更(不多于O(log n)次数) 来恢复
- [logn+1,2logn+1],树高是O(logn)
-
红黑树的插入操作
默认新插入结点为红色(插入红色可能不会破坏红黑树性质、插入黑色节点一定破坏红黑树性质;连续两个红色结点多数情况下比多出黑色结点更容易恢复红黑树的性质)
在下文说明里,将要插入的节点标为N,N的父节点标为P,N的祖父节点标为G,N的叔父节点标为U
-
情形1:新插入节点父节点为空,即新插入节点为根节点;否则进入 情形2
- 将节点颜色变更为黑色
void insert_case1(node *n){ if(n->parent == NULL) n->color = BLACK; else insert_case2 (n); }
- 将节点颜色变更为黑色
-
情形2:新节点的父节点P是黑色;否则进入 情形3
- 无需任何操作,满足红黑树的性质
void insert_case2(node *n){ if(n->parent->color == BLACK) return; /* 树仍旧有效*/ else insert_case3 (n); }
- 无需任何操作,满足红黑树的性质
以下情形,插入节点父节点为红色,则插入节点必定有祖父节点,插入节点必定有叔父节点(虽然可能为NIL节点)
-
情形3:新节点父节点P是红色,且叔父节点U同为红色;否则进入 情形4
- 我们可以将P、U节点同祖父节点颜色对掉(此时黑节点个数相同性质未受影响,但祖父节点的父节点可能为红色,即出现连续红色节点)。将祖父节点带入 情况1 继续递归判断
void insert_case3(node *n){ if(uncle(n) != NULL && uncle (n)->color == RED) { n->parent->color = BLACK; uncle (n)->color = BLACK; grandparent (n)->color = RED; insert_case1(grandparent(n)); } else insert_case4 (n); }
-
情形四:新节点的父节点P是红色,且叔父节点U是黑色(黑色或NIL),P和N分别在其父节点的不同侧
- 当P是G的左节点,且N是P的右节点时,将P做左旋;之后进入 情形5
- 当P是G的右节点,且N是P的左节点时,将P做右旋;之后进入 情形5
void insert_case4(node *n){ if(n == n->parent->right && n->parent == grandparent(n)->left) { rotate_left(n->parent); n = n->left; } else if(n == n->parent->left && n->parent == grandparent(n)->right) { rotate_right(n->parent); n = n->right; } insert_case5 (n); }
- P为G右子节点的旋转图
-
情形5:新节点的父节点P是红色,且叔父节点U是黑色(黑色或NIL);P和N分别在其父节点同侧
- 当P和N分别是其父节点的左节点,对其祖父节点G做右旋,且将父节点和祖父节点换色
- 当P和N分别是其父节点的右节点,对其祖父节点G做左旋,且将父节点和祖父节点换色
void insert_case5(node *n){ n->parent->color = BLACK; grandparent (n)->color = RED; if(n == n->parent->left && n->parent == grandparent(n)->left) { rotate_right(grandparent(n)); } else { /* Here, n == n->parent->right && n->parent == grandparent (n)->right */ rotate_left(grandparent(n)); } }
- P为G右子节点的旋转图
-
红黑树的删除操作
对于二叉查找树,在删除带有两个非叶子儿子的节点的时候,我们要么找到它左子树中的最大元素、要么找到它右子树中的最小元素,并把它的值转移到要删除的节点中,然后删除被复制值的节点。
对于以下的删除分析,基于只存在至多一个非NIL子节点的情况
-
情况1:删除节点是红色结点:由于该节点至多存在一个非NIL子节点的情况,且其存在子节点必为黑色,所以其两个子节点一定是NIL
- 直接删除(删除节点为红色只存在这一种情况)
-
情况2:删除节点为黑色,且其子节点为红色(由于至多一个非NIL子节点,如果存在非NIL子节点,一定是红色,否则不同路线黑色节点数便不相等,违背性质5)
- 将其子节点移至被删节点,并切换颜色为黑色
void delete_case_1_2(struct node *n) { if(n->color == BLACK){ if(child->color == RED) child->color = BLACK; else delete_case3 (child); } free (n); }
以上两种情况排除之后,便只剩删除节点为黑色,且其子节点都为NIL的情况
我们称代替节点为N,其父节点为P,其兄弟节点为S,S的左右儿子节点分别为Sl,Sr
// 得到代替节点的兄弟节点
struct node * sibling(struct node *n)
{
if(n == n->parent->left)
return n->parent->right;
else
return n->parent->left;
}
-
情况3:N是根节点
- 无需操作
void delete_case3(struct node *n) { if(n->parent != NULL) delete_case4 (n); }
在情形4、7、8下,我们假设N为其父节点的左子节点,如果N为右子节点,则后续的操作,左右应该对掉(参照插入的同侧异侧的对称性)
-
情形4:S为红色,则P和Sl、Sr一定为黑色
- 我们将P做左旋,并对掉P和S的颜色,之后可以按 情形6、7、8 处理
void delete_case4(struct node *n) { struct node *s = sibling (n); if(s->color == RED){ n->parent->color = RED; s->color = BLACK; if(n == n->parent->left) rotate_left(n->parent); else rotate_right(n->parent); } delete_case5 (n); }
-
情况5:N的父亲、S和S的儿子都是黑色的(Sr,Sl为NIL节点)
- 重绘S为红色(N处删除了一个黑节点,S由红变黑,故现在通过P的路径,都少了一个黑节点,将P当作N从 情形1 开始重新做平衡处理)
void delete_case5(struct node *n) { struct node *s = sibling (n); if((n->parent->color == BLACK) && (s->color == BLACK) && (s->left->color == BLACK) && (s->right->color == BLACK)) { s->color = RED; delete_case1(n->parent); } else delete_case6 (n); }
-
情况6:S和S的儿子都是黑色(Sr,Sl为NIL节点),但是N的父亲是红色
- 交换S和P的颜色,此时,通过S的路径黑色节点数不会变,同时对N路径上被删除的黑节点做了补充,满足条件
void delete_case6(struct node *n) { struct node *s = sibling (n); if((n->parent->color == RED) && (s->color == BLACK)&& (s->left->color == BLACK) && (s->right->color == BLACK)) { s->color = RED; n->parent->color = BLACK; } else delete_case7 (n); }
-
情况7:S是黑色,S的左儿子Sl是红色,S的右儿子是黑色,而N是它父亲的左儿子
- 在S上做右旋转,并交换S和Sl的颜色(N以及P都不受此次变换影响,同时左侧的黑节点数目也未变动),进入情况8
void delete_case7(struct node *n) { struct node *s = sibling (n); if(s->color == BLACK){ if((n == n->parent->left) && (s->right->color == BLACK) && (s->left->color == RED)) { s->color = RED; s->left->color = BLACK; rotate_right (s); } else if((n == n->parent->right) && (s->left->color == BLACK) && (s->right->color == RED)) { s->color = RED; s->right->color = BLACK; rotate_left (s); } } delete_case8 (n); }
-
情况8:S是黑色,S的右儿子是红色,而N是它父亲的左儿子
- P上做左旋,同时交换S和P的颜色,将Sr改为黑色
void delete_case6(struct node *n) { struct node *s = sibling (n); s->color = n->parent->color; n->parent->color = BLACK; if(n == n->parent->left){ s->right->color = BLACK; rotate_left(n->parent); } else { s->left->color = BLACK; rotate_right(n->parent); } }
- 该情况下操作,我们不管P的颜色(图上先标成蓝色),由于P左旋操作之前,S一定是黑色;故P左旋操作之后,S和P交换颜色后,P一定是黑色。则此步操作后:
- 对于通过N的节点,它增加了一个黑色结点P,故将之前删除的黑色节点数修复。
- 对于不通过N的节点:它通过N的兄弟节点 3,它以前和现在都通过S、P,S同P只是交换了颜色,并不破坏性质。
- 对于不通过N的节点:它通过N的叔父节点 Sr,它以前通过S和P;由于现在S同P交换了颜色,且Sr仍旧通过S,所以相当于Sr较以前少通过 原S节点(黑色),Sr被由红色变黑色,对这一条路径做了补充,所以满足条件。
由上面流程可以看到,删除至多会经过三次旋转。删除同插入一样,可以做到‘尾部递归’ ,对于尾部递归,有的语言(c)会做优化处理。经测试,java并没有做这一块的优化,貌似是java认为程序员应该尽量避免使用递归。
-
其它 树 数据结构
-
2-3查找树 1970
- 符合b树的规则,可以理解成红黑树的前身
- 内部节点(存在子节点的节点)要么有2个孩子和1个数据元素,要么有3个孩子和2个数据元素,叶子节点没有孩子,并且有1个或2个数据元素(1百万个节点的2-3树,树的高度为12-20之间,对于10亿个节点的2-3树,树的高度为18-30之间)
- 在最坏的情况下,也就是所有的节点都是2-node节点,查找效率为lgN
- 在最好的情况下,所有的节点都是3-node节点,查找效率为log3N约等于0.631lgN
- 2-3查找树实现起来比较复杂(需要考虑不同节点类型,较繁琐;多次比较次数才能定位下移;4node节点拆分操作复杂)
- 红黑树是在二插查找树基础上的2-3-4查找树实现(2-3树同2-3-4树可以类比理解,红黑树两个红子节点,将链接放平,就是一个4阶节点,下图可对比理解红黑树与2-3查找树)
-
b-tree 【可以由上图的2-3 tree套用b-tree的性质(3阶b-tree)】
- 每个节点最多有 m (树阶)个子节点
- 除根节点和叶子节点,其它每个节点至少有 [m/2] (向上取整的意思)个子节点
- 若根节点不是叶子节点,则其至少有2个子节点
- 所有NULL节点到根节点的高度都一样
- 除根节点外,其它节点都包含 n 个key,其中 [m/2] -1 <= n <= m-1
-
b+tree
- 非叶子节点存索引key,叶子节点存具体数据(在mysql中,不同数据引擎实现不一样,MyISAM,InnoDB)
- MyISAM(非聚簇索引,数据文件和索引文件分开),InnoDB(数据文件本身就是索引文件,主键索引(聚簇索引)的叶子节点包含整条记录数据,其余非聚簇索引叶子节点存放对应主键),b+tree维护了叶子节点间的顺序指针(提高区间查询效率)
- 为什么索引选用b-tree,b+tree实现:
- 一般来说,需要用到索引的数据场景,数据同索引本身就很大了,不可能全部存储在内存中,常以文件的形式存储在磁盘上。这样的话,索引查找过程中就要产生磁盘I/O消耗,所以评价一个数据结构作为索引的优劣最重要的指标就是在查找过程中磁盘I/O操作次数的渐进复杂度。换句话说,索引的结构组织要尽量减少查找过程中磁盘I/O的存取次数。
- 对于传统的机械硬盘来说,读取数据需要有寻道(定位磁道)、旋转(定位扇区)的过程,这相较于内存操作存取速度相差很大。但如果是顺序存取,磁盘不需要定位磁道,只需少量旋转,读取效率就会高很多。磁盘读写,存在一个预读机制,预读的长度一般为页(page)的整倍数。页是计算机管理存储器的逻辑块,硬件(磁盘块(Block))及操作系统往往将主存和磁盘存储区分割为连续的大小相等的块,每个存储块称为一页(在许多操作系统中,页的大小通常为4k),主存和磁盘以页为单位交换数据。当程序要读取的数据不在主存中时,会触发一个缺页异常,此时系统会向磁盘发出读盘信号,磁盘会找到数据的起始位置并向后连续读取一页或几页载入内存中。
- 对于b-tree,每次新建节点时,直接申请一个页的空间,这样就保证一个节点物理上也存储在一个页里,加之计算机存储分配都是按页对齐的,就实现了一个node只需一次I/O;并且一次查找过程产生的I/O操作与树的高度正相关O(h)。
- 对于b+tree,由于内部节点不存数据,只存key,所以一个节点可以存更多的位置元素,树的高度比b-tree更低(效率就更高,b-tree节点大小一般都一样,b+tree内节点和叶子节点大小不一样)。叶子节点维护了顺序访问指针,区间顺序查询效率很高;并且由于所有数据都存在叶子节点,查询的稳定性也比b-tree高。
-
AVL TREE 1962
- 任一节点对应的两棵子树的最大高度差为1
- 节点的平衡因子是它的左子树的高度减去它的右子树的高度(有时相反)。带有平衡因子1、0或 -1的节点被认为是平衡的。带有平衡因子 -2或2的节点被认为是不平衡的
- 虽然从算法时间效率度来说,AVL树有的情况比红黑树更优。插入和删除后恢复平衡也并不像有些误区知识点说的那么不堪,但实际却是红黑树被广泛运用,AVL的实际运用场景较红黑树少很多(红黑树基本上成了增删改查环境二叉查找树里的最优选择)。
- 个人认为有以下两个点:avl的平衡度比rb更高,在综合数据场景下,插入删除破坏平衡的几率更高;avl插入删除时,涉及修改操作的节点数更多(因为一般实现avl树,都是在节点内保存节点的树高度),所以对细粒度锁场景的支持,rb多数情况下会更友好(虽然rb涉及一个颜色更改的递归过程,但是情况是少数)。
- 在知乎上看到过一个用户,列出了自己实现的avl和rb的比较;所以有点懵,难道之前AVL被轻视是因为实现算法的程序员没找到最好假设场景下AVL的实现?感觉不至于,毕竟都是最牛的一批人,便把这个问题抛出来吧!【为什么红黑树比AVL树应用更加广泛?】
- r-b tree 应用场景
- Java: java.util.TreeMap, java.util.TreeSet,*HashMap
- C++ STL: map, multimap, multiset
- Linux kernel: completely fair scheduler, linux/rbtree.h
- nginx 超时管理
- 等…
-
-
HashMap中的红黑树
-
我们先来看看红黑树在java中旋转操作的基本实现
- rotateLeft 左旋
static <K,V> TreeNode<K,V> rotateLeft(TreeNode<K,V> root, TreeNode<K,V> p) { TreeNode<K,V> r, pp, rl; if (p != null && (r = p.right) != null) { // 旋转节点不为空,且其右节点不为空 if ((rl = p.right = r.left) != null) // 旋转节点 --right child--> 原右子节点的左子节点 rl.parent = p; // 旋转节点新右子节点不为空时,新右子节点 --parent--> 旋转节点 if ((pp = r.parent = p.parent) == null) // 旋转节点的父节点为空(为根) (root = r).red = false; // 将r(旋转节点原右子节点)设置为根节点,且颜色变黑 else if (pp.left == p) // 判断左右旋转节点在原父节点的位置,维护旋转节点与原右子节点关系 pp.left = r; else pp.right = r; r.left = p; // 原右子节点 --left child--> 旋转节点 p.parent = r; // 旋转节点 --parent--> 原右子节点 } return root; }
- rotateRight 右旋 (就不加注释了)
static <K,V> TreeNode<K,V> rotateRight(TreeNode<K,V> root, TreeNode<K,V> p) { TreeNode<K,V> l, pp, lr; if (p != null && (l = p.left) != null) { if ((lr = p.left = l.right) != null) lr.parent = p; if ((pp = l.parent = p.parent) == null) (root = l).red = false; else if (pp.right == p) pp.right = l; else pp.left = l; l.right = p; p.parent = l; } return root; }
-
insert/delete/get
- putTreeVal
/** * Tree version of putVal. */ final TreeNode<K,V> putTreeVal(HashMap<K,V> map, Node<K,V>[] tab, int h, K k, V v) { Class<?> kc = null; boolean searched = false; TreeNode<K,V> root = (parent != null) ? root() : this; for (TreeNode<K,V> p = root;;) {// 根节点遍历 int dir, ph; K pk; if ((ph = p.hash) > h) dir = -1; else if (ph < h) dir = 1; else if ((pk = p.key) == k || (k != null && k.equals(pk))) // 找到相同key,返回旧节点 return p; else if ((kc == null && (kc = comparableClassFor(k)) == null) || (dir = compareComparables(kc, k, pk)) == 0) {//如果hash值相等,检查是否实现comparable(是否可比较) if (!searched) { // 可以比较,分别从左子树右子树,按TreeNode查找方式查找 TreeNode<K,V> q, ch; searched = true; // 保证按树查找方式只执行一次(此处照理按树的方式未找到,退出树查找后也找不到,但还是要依次退到叶子节点去) if (((ch = p.left) != null && (q = ch.find(h, k, kc)) != null) || ((ch = p.right) != null && (q = ch.find(h, k, kc)) != null)) return q; } // 如果以上都未判断出查找结果以及查找方向,下面方法根据类名及类名hash值,保证返回一个不为0的数 dir = tieBreakOrder(k, pk); } TreeNode<K,V> xp = p; if ((p = (dir <= 0) ? p.left : p.right) == null) {// 遍历到左右子树的叶子节点 Node<K,V> xpn = xp.next; TreeNode<K,V> x = map.newTreeNode(h, k, v, xpn);// 新建TreeNode节点,维护红黑树及双向链表关系 if (dir <= 0) xp.left = x; else xp.right = x; xp.next = x; x.parent = x.prev = xp; if (xpn != null) ((TreeNode<K,V>)xpn).prev = x; moveRootToFront(tab, balanceInsertion(root, x));// 插入后恢复平衡 return null; } } }
- balanceInsertion 平衡插入
static <K,V> TreeNode<K,V> balanceInsertion(TreeNode<K,V> root, TreeNode<K,V> x) { // 此处已经将插入节点定位到叶子节点处,按我们之前分析规则,插入节点默认红色 x.red = true; for (TreeNode<K,V> xp, xpp, xppl, xppr;;) { if ((xp = x.parent) == null) {// 插入情况1:插入节点为根节点,将颜色变黑 x.red = false; return x; } else if (!xp.red || (xpp = xp.parent) == null) //插入情况2:父节点是黑色,满足性质 return root; if (xp == (xppl = xpp.left)) { // 父节点是祖父节点的左子节点 if ((xppr = xpp.right) != null && xppr.red) { //插入情况3: 祖父节点的右子节点(叔父节点)为红色 xppr.red = false; // 叔父节点颜色变黑 xp.red = false; // 父节点颜色变黑 xpp.red = true; // 祖父节点颜色变红 x = xpp; // 从祖父节点开始重新平衡(迭代,此处是循环) } else {// 叔父节点为黑色或为空 if (x == xp.right) { // 插入情况4:父节点是祖父节点左子节点,插入节点是父节点右子节点(不同侧) // 在父节点上做左旋操作 /** * 可能因为旋转后xp会被移动到最左侧子节点,代码为了保证之前x/xp/xpp变量名与位置的关系,便传入了x * 这样左旋操作之后,下面的三目表达式取出的节点关系,仍然是x/xp/xpp */ root = rotateLeft(root, x = xp); xpp = (xp = x.parent) == null ? null : xp.parent; // 取到祖父节点 } // 插入情况5:父节点是祖父节点左子节点,插入节点是父节点的左子节点(同侧,可从情况4旋转得到) if (xp != null) { // 这个为空判断以及下面对祖父的null判断,没看懂是为了避免什么(多线程?) xp.red = false; // 父节点和祖父节点交换颜色,右旋 if (xpp != null) { xpp.red = true; root = rotateRight(root, xpp); } } } } else { // 父节点是祖父节点的右子节点,分析方式一样,不再加注释说明 if (xppl != null && xppl.red) { xppl.red = false; xp.red = false; xpp.red = true; x = xpp; } else { if (x == xp.left) { root = rotateRight(root, x = xp); xpp = (xp = x.parent) == null ? null : xp.parent; } if (xp != null) { xp.red = false; if (xpp != null) { xpp.red = true; root = rotateLeft(root, xpp); } } } } } }
-
removeTreeNode
/** * Removes the given node, that must be present before this call. * This is messier than typical red-black deletion code because we * cannot swap the contents of an interior node with a leaf * successor that is pinned by "next" pointers that are accessible * independently during traversal. So instead we swap the tree * linkages. If the current tree appears to have too few nodes, * the bin is converted back to a plain bin. (The test triggers * somewhere between 2 and 6 nodes, depending on tree structure). */ final void removeTreeNode(HashMap<K,V> map, Node<K,V>[] tab, boolean movable) { int n; if (tab == null || (n = tab.length) == 0) // 表为空或不存在元素,返回 return; int index = (n - 1) & hash; TreeNode<K,V> first = (TreeNode<K,V>)tab[index], root = first, rl; // 获取下标位节点 TreeNode<K,V> succ = (TreeNode<K,V>)next, pred = prev; // 获取删除结点的prev节点和next节点 if (pred == null) tab[index] = first = succ; // 如果prev节点为null ,即删除根,下标位存放根next节点 else pred.next = succ; // 否则前节点的next指向删除节点的next(删除操作,删除双向链表关系) if (succ != null) succ.prev = pred; // next节点不为null,维护prev关系 if (first == null) // 只有一个节点(即被删除节点) return; if (root.parent != null) root = root.root(); // 定位根节点 if (root == null || root.right == null || (rl = root.left) == null || rl.left == null) { tab[index] = first.untreeify(map); // too small return; // 如果根以及根存在null子节点,以及根左左孙节点为null,认为树太小,链表化直接返回 } // 上部分代码用于处理删除节点的双向链表关系 // 下部分代码用于查找删除的代替节点,并调用balanceDeletion维持删除后的平衡 TreeNode<K,V> p = this, pl = left, pr = right, replacement; if (pl != null && pr != null) { //待删节点存在两个非null子节点 TreeNode<K,V> s = pr, sl; while ((sl = s.left) != null) // find successor s = sl;// 找到右子树中的左叶节点(右树最小) /** * 交换颜色,注意此处步骤同我们在红黑树理论中讲的有点差别 * 在理论中我们讲的是直接调换删除节点和替换节点的值,颜色不作交换,然后删除替换节点(替换节点理论可为NIL) * 此处删除节点和代替节点实际做了完全的交换 */ boolean c = s.red; s.red = p.red; p.red = c; // swap colors TreeNode<K,V> sr = s.right; // 左叶节点的右节点 TreeNode<K,V> pp = p.parent; // 删除节点的父节点 if (s == pr) { // 如果s就是P的右子节点,说明p右子节点无左子节点(p和s交换位置) p.parent = s; s.right = p; } else { // 存在左子节点 TreeNode<K,V> sp = s.parent; // 找到左子节点父节点 if ((p.parent = sp) != null) { // p代替左子节点在其父节点的对应位置(原左子节点处,新 父<-->左子 关系维护) if (s == sp.left) sp.left = p; else sp.right = p; } if ((s.right = pr) != null) // 原p节点处 新 右子<-->父 关系维护 pr.parent = s; } p.left = null; // p移至左子节点处,故p的新左子节点为null if ((p.right = sr) != null) // 原左子节点处,新 右子<-->父 关系维护 sr.parent = p; if ((s.left = pl) != null) // 原p节点处,新 左子<-->父 关系维护 pl.parent = s; if ((s.parent = pp) == null) // 如果s移动后,为根节点,记录根节点 root = s; else if (p == pp.left) //否则维护s移动后与原P父节点的关系 pp.left = s; else pp.right = s; if (sr != null) // 移动后的p的右子节点不为空,替换节点为右子节点,否则替换节点为移动后的p replacement = sr; else replacement = p; } // 待删节点至多存在一个非null节点 else if (pl != null) replacement = pl; else if (pr != null) replacement = pr; else replacement = p; if (replacement != p) { // 如果替换节点不是待删节点(执行替换逻辑,并删除p) TreeNode<K,V> pp = replacement.parent = p.parent; if (pp == null) root = replacement; else if (p == pp.left) pp.left = replacement; else pp.right = replacement; p.left = p.right = p.parent = null; } // 删除情况1:删红,不影响树性质,直接取root,否则执行删除平衡方法balanceDeletion TreeNode<K,V> r = p.red ? root : balanceDeletion(root, replacement); // 如果替换节点是删除节点本身,删除该结点红黑树关系(上面步骤该情况下还未执行树关系的删除) if (replacement == p) { // detach TreeNode<K,V> pp = p.parent; p.parent = null; if (pp != null) { if (p == pp.left) pp.left = null; else if (p == pp.right) pp.right = null; } } if (movable) moveRootToFront(tab, r); // 始终保证根在数组位置并在双向链表首位 }
-
balanceDeletion 平衡删除
static <K,V> TreeNode<K,V> balanceDeletion(TreeNode<K,V> root, TreeNode<K,V> x) { for (TreeNode<K,V> xp, xpl, xpr;;) { if (x == null || x == root) return root; else if ((xp = x.parent) == null) { x.red = false; return x; } // 以上,如果替换节点为根,将替换节点变黑返回(上面逻辑感觉只需第二个 else if,未看懂第一条判断目的) // 删除情况2:删除节点为黑色,且其子节点为红色,直接将子节点颜色变黑(外围已执行断链操作) else if (x.red) { x.red = false; return root; } else if ((xpl = xp.left) == x) {// 替换节点是其父节点左子节点 if ((xpr = xp.right) != null && xpr.red) {// 删除情况4:右兄弟节点不为null,且是红色 xpr.red = false; // 调换右兄弟节点和父节点的颜色,并对父节点做左旋操作 xp.red = true; root = rotateLeft(root, xp); // 下列代码,xp左旋后,成了x的父节点且为红色,xpr变成原xpr的left xpr = (xp = x.parent) == null ? null : xp.right; } // 特殊删除情况5、6(兄弟节点为空,则可由父节点开始重新做平衡【通过x少一个黑节点==通过xp少一个黑节点】) if (xpr == null) x = xp; else {// 此else内,xpr一定是黑色(如果是红色,在上一个if中就已经根据删除情况4处理了) TreeNode<K,V> sl = xpr.left, sr = xpr.right; if ((sr == null || !sr.red) && (sl == null || !sl.red)) { // 删除情况5、6:兄弟是黑色,兄弟的两个儿子为空或为黑色 xpr.red = true; // 此时如果是情况5,xp是黑色,则继续按逻辑执行 x = xp; // 此时如果是情况6,xp是红色,重新进入循环后,xp会被置为黑色(平衡恢复) } else { if (sr == null || !sr.red) { if (sl != null) // 删除情况7:xpr黑色,sl红色,sr黑色 sl.red = false;// xpr同sl换色,并在xpr做右旋 xpr.red = true; root = rotateRight(root, xpr); // 此处取的xpr就是旋转前的sl(已换色black) xpr = (xp = x.parent) == null ? null : xp.right; } if (xpr != null) {// 删除情况8:xpr是黑色,xpr的右儿子是红色(上步骤情况7保证) xpr.red = (xp == null) ? false : xp.red; // xpr拿到xp的颜色 if ((sr = xpr.right) != null) sr.red = false; //sr(xprr)颜色变黑 } if (xp != null) { xp.red = false; // xp颜色变黑(相当于拿到xpr的颜色,即删除情况8中的颜色交换) root = rotateLeft(root, xp); // 右旋,恢复平衡 } x = root; } } } // 替换节点是其父节点的右子节点 对称的,无需再说明 else { // symmetric if (xpl != null && xpl.red) { xpl.red = false; xp.red = true; root = rotateRight(root, xp); xpl = (xp = x.parent) == null ? null : xp.left; } if (xpl == null) x = xp; else { TreeNode<K,V> sl = xpl.left, sr = xpl.right; if ((sl == null || !sl.red) && (sr == null || !sr.red)) { xpl.red = true; x = xp; } else { if (sl == null || !sl.red) { if (sr != null) sr.red = false; xpl.red = true; root = rotateLeft(root, xpl); xpl = (xp = x.parent) == null ? null : xp.left; } if (xpl != null) { xpl.red = (xp == null) ? false : xp.red; if ((sl = xpl.left) != null) sl.red = false; } if (xp != null) { xp.red = false; root = rotateRight(root, xp); } x = root; } } } } }
-
get
- TreeNode find 二叉查找树的逻辑
/** * Finds the node starting at root p with the given hash and key. * The kc argument caches comparableClassFor(key) upon first use * comparing keys. */ final TreeNode<K,V> find(int h, Object k, Class<?> kc) { TreeNode<K,V> p = this; do { int ph, dir; K pk; TreeNode<K,V> pl = p.left, pr = p.right, q; if ((ph = p.hash) > h) p = pl; else if (ph < h) p = pr; else if ((pk = p.key) == k || (k != null && k.equals(pk))) return p; else if (pl == null) p = pr; else if (pr == null) p = pl; else if ((kc != null || (kc = comparableClassFor(k)) != null) && (dir = compareComparables(kc, k, pk)) != 0) p = (dir < 0) ? pl : pr; else if ((q = pr.find(h, k, kc)) != null) return q; else p = pl; } while (p != null); return null; }
-
面试可能会问的jdk1.7在并发环境下,扩容时产生循环链表的问题
- 主要原因是jdk1.7链表采用头插法(它认为最新插入的元素是最近最容易被使用的),在扩容之后,原节点在新链会反序,两个扩容线程在特定的执行情况下,扩容后会形成循环链表,不止是get操作,只要是操作经过循环链,又没找到目标节点,需要next继续执行,就会产生死循环。(具体产生循环链流程,网上有很多分析)
- jdk1.8之前我们分析过,已经使用尾插法了,不会有循环链表的问题。
- HashMap线程非安全,是因为在多线程环境下,我们并没有为它的非原子操作制定线程安全策略。循环链表只是并发问题中,影响性较大、较突出的bug,解决了循环链表的问题,并不是说HashMap就是线程安全的。多线程环境下仍旧不能用HashMap
-
参考
上一篇: 设计模式之一:六大设计原则
下一篇: Torque 3D 游戏引擎开源