Arrays.sort源码解析

程序员文章站 2024-03-06 21:03:14

...

在工作中，因为很少去查看源码，导致很多问题可能没有深入的研究。毕竟需要有所提高，必须要能够看懂优秀的人写的代码，因此我花时间从JDK源码开始，在看源码时，并以博客方式记载!

(该源码基于JDK1.8版本，与较低版本有较大差别)

一、Arrays.sort方法简介

Sorts the specified range of the specified array of objects according to the order induced by the specified comparator. 
The range to be sorted extends from index fromIndex, inclusive, to index toIndex, exclusive. 
(If fromIndex==toIndex, the range to be sorted is empty.) 
All elements in the range must be mutually comparable by the specified comparator 
(that is, c.compare(e1, e2) must not throw a ClassCastException for any elements e1 and e2 in the range).

 This sort is guaranteed to be stable: equal elements will not be reordered as a result of the sort.
Implementation note: This implementation is a stable, adaptive, 
iterative mergesort that requires far fewer than n lg(n) comparisons when the input array is partially sorted, 
while offering the performance of a traditional mergesort when the input array is randomly ordered. 
If the input array is nearly sorted, the implementation requires approximately(大约地;大致) n comparisons. 
Temporary storage requirements vary from a small constant for nearly sorted input arrays to n/2 object references for randomly ordered input arrays.
The implementation takes equal advantage of ascending and descending order in its input array, 
and can take advantage of ascending and descending order in different parts of the the same input array. 
It is well-suited to merging two or more sorted arrays: simply concatenate the arrays and sort the resulting array.

以上大致说了以下几个信息：

1. 排序的元素必须实现Comparable接口，在程序中执行compare方法时，一定不能ClassCastException异常。

2. 相等的元素不会进行重排序操作

3. 方法中采用了归并排序方式排序，当数组中部分有序状态是，需要 nlg(n)次比较

4. 当数组中数据基本处于有序状态时，大约需要 n 次比较

二、源码解读

2.1 入口方法sort

public static <T> void sort(T[] a, int fromIndex, int toIndex, Comparator<? super T> c) {
        // 是否自定义比较器
        if (c == null) {
            sort(a, fromIndex, toIndex);
        } else {
            rangeCheck(a.length, fromIndex, toIndex);
            // 该处的判断，LegacyMergeSort在1.8中被标记为不建议使用，并在未来版本中会移除,
            // 因此，该处可以忽略
            if (LegacyMergeSort.userRequested)
                legacyMergeSort(a, fromIndex, toIndex, c);
            else
                // 该处是自定义排序器的执行方法
                TimSort.sort(a, fromIndex, toIndex, c, null, 0, 0);
        }
    }

2.2 未设置Comparator

public static void sort(Object[] a, int fromIndex, int toIndex) {
        // 该方法判断是否超出数组边界
        rangeCheck(a.length, fromIndex, toIndex);
        if (LegacyMergeSort.userRequested)
            legacyMergeSort(a, fromIndex, toIndex);
        else
            ComparableTimSort.sort(a, fromIndex, toIndex, null, 0, 0);
    }

// 检查数据边界，如果超过边界，则跑出异常
private static void rangeCheck(int arrayLength, int fromIndex, int toIndex) {
        if (fromIndex > toIndex) {
            throw new IllegalArgumentException(
                    "fromIndex(" + fromIndex + ") > toIndex(" + toIndex + ")");
        }
        if (fromIndex < 0) {
            throw new ArrayIndexOutOfBoundsException(fromIndex);
        }
        if (toIndex > arrayLength) {
            throw new ArrayIndexOutOfBoundsException(toIndex);
        }
    }

通过sort方法可以得知，在没有指定自定义Comparator的时候，实际是执行ComparableTimSort方法的静态方法，下面我们看下该类下的方法具体做了什么事情:

static void sort(Object[] a, int lo, int hi, Object[] work, int workBase, int workLen) {
        // 检查传入数据是否合法，assert的常规用法
        assert a != null && lo >= 0 && lo <= hi && hi <= a.length;
        
        // 计算数组的排序区间
        int nRemaining  = hi - lo;

        // 如果排序区间小于2，实际上无需排序，直接返回
        if (nRemaining < 2)
            return;  // Arrays of size 0 and 1 are always sorted

        // If array is small, do a "mini-TimSort" with no merges
        // 该处,MIN_MERGE=32,该数字的生成，源码注解中是依靠经验判断来设置
        // 如果重新设置该值，可能会导致数组越界的风险
        if (nRemaining < MIN_MERGE) {
            // 计算数组中有序子数组的长度，该处实际返回的再分段情况下无序元素开始的索引
            int initRunLen = countRunAndMakeAscending(a, lo, hi);
            binarySort(a, lo, hi, lo + initRunLen);
            return;
        }

        /**
         * March over the array once, left to right, finding natural runs,
         * extending short natural runs to minRun elements, and merging runs
         * to maintain stack invariant.
         */
        // 该类主要用来合并排序结果
        ComparableTimSort ts = new ComparableTimSort(a, work, workBase, workLen);
        
        // 该处计算除了的数组长度，世界上，当长度超过32长度之后，在处理的时候,
        // 该值最终范围为 16 <= k <= 32之间
        int minRun = minRunLength(nRemaining);
        do {
            // Identify next run
            // 计算有序子数组的长度
            int runLen = countRunAndMakeAscending(a, lo, hi);            
            // If run is short, extend to min(minRun, nRemaining)
            // 该处为什么要判断?该判断主要为了确认，在该排序的子数组中，是否包含了无序排列
            // 如果runLen == mainRun的时候，就说明数组中的所有元素都处于有序状态
            if (runLen < minRun) {                
                int force = nRemaining <= minRun ? nRemaining : minRun;
                // 排序, 在该处中，lo是开始索引,force为有序数组的长度，所以lo+runLen为实际开始排序的元素的索引
                // 而 lo + force 确认了，当次排序的范围
                binarySort(a, lo, lo + force, lo + runLen);
                runLen = force;            
            }   
            // Push run onto pending-run stack, and maybe merge
            // 比较片段，并合并结果
            ts.pushRun(lo, runLen);            
            ts.mergeCollapse();            
            // Advance to find next run            
            lo += runLen;            
            nRemaining -= runLen;        
        } while (nRemaining != 0);        
        // Merge all remaining runs to complete sort        
        assert lo == hi;        
        ts.mergeForceCollapse();        
        assert ts.stackSize == 1;    
    }

下面我们来看下，程序中是如何判断有序子数组的长度的:

/**
* lo: 从开始排序元素的下标
* hi: 需要排序的数组长度
*/
private static int countRunAndMakeAscending(Object[] a, int lo, int hi) {
        assert lo < hi;
        int runHi = lo + 1;
        if (runHi == hi)
            return 1;

        // Find end of run, and reverse range if descending
        // 该处验证了，当在没有指定自定义Comparator的时候，会将元素强制转换为Comparable对象
        // 之后再调用comparaTo接口来比较两个元素的大小,这里指名了为什么需要元素实现comparable接口
        // 该处比较元素从下标为 lo + 1开始，如果lo + 1比上个元素要小，那么会按照降序排列提取子数组，并将
        // 降序排序的数组进行反转，就变成为升序排列数组
        if (((Comparable) a[runHi++]).compareTo(a[lo]) < 0) { // Descending
            while (runHi < hi && ((Comparable) a[runHi]).compareTo(a[runHi - 1]) < 0)
                runHi++;
            // 具体反转的方法
            reverseRange(a, lo, runHi);
        } else {                              // Ascending
            while (runHi < hi && ((Comparable) a[runHi]).compareTo(a[runHi - 1]) >= 0)
                runHi++;
        }
        // 通过有序子数组结束的下标 - 开始下标,得到有序子数组的长度
        return runHi - lo;
    }

在上面的方法中，主要是实现了两点:

1. 从开始下标开始检索需要排序的数组，如果为降序排列，计算有序子数组结束时的下标runHi++,并将降序子数组反转为升序数组

2. 如果为升序子数组，则计算有序部分结束元素的下标

下面我们简单的看下反转的实现:

private static void reverseRange(Object[] a, int lo, int hi) {
        hi--;
        while (lo < hi) {
            Object t = a[lo];
            a[lo++] = a[hi];
            a[hi--] = t;
        }
    }

通过查看源码我们知道，是将数组中的元素首尾进行交换来实现的。

最终，就是实现排序的方法实现源码:

/**
*   a: 需要排序的数组
*   lo: 开始排序的下标
*   hi: 排序的长度
*   start: 无序排序的元素开始下标
*/
private static void binarySort(Object[] a, int lo, int hi, int start) {
        assert lo <= start && start <= hi;
        if (start == lo)
            start++;
        for ( ; start < hi; start++) {
            // 获取当前无序状态的元素
            Comparable pivot = (Comparable) a[start];

            // Set left (and right) to the index where a[start] (pivot) belongs
            // 元素比较的开始下标
            int left = lo;
            // 比较完成的结束坐标
            int right = start;
            assert left <= right;
            /*
             * Invariants:
             *   pivot >= all in [lo, left).
             *   pivot <  all in [right, start).
             */
            // 该处我觉得是采用二分查找的方式，来得出无序状态的元素pivot应该处于
            // 其左侧部分数组的位置
            while (left < right) {
                // 该处计算出的是，从left开始到right的中间元素的下标,
                // 因为(left + right) / 2 永远不会超过有效的数组边界
                int mid = (left + right) >>> 1;
                // 将无序元素与该次的中间值记性比较，
                // 如果当前的pivot的值小于中间值，则将right的坐标移动到中间下标
                // 为什么？因为从无序元素开始的start开始，其左边元素已经处于升序的有序状态，中间值的比较
                // 可以较少比较次数，和比较范围
                if (pivot.compareTo(a[mid]) < 0)
                    right = mid;
                else
                    left = mid + 1;
            }
            assert left == right;

            /*
             * The invariants still hold: pivot >= all in [lo, left) and
             * pivot < all in [left, start), so pivot belongs at left.  Note
             * that if there are elements equal to pivot, left points to the
             * first slot after them -- that's why this sort is stable.
             * Slide elements over to make room for pivot.
             */
            int n = start - left;  // The number of elements to move
            // Switch is just an optimization for arraycopy in default case
            switch (n) {
                case 2:  a[left + 2] = a[left + 1];
                case 1:  a[left + 1] = a[left];
                         break;
                default: System.arraycopy(a, left, a, left + 1, n);
            }
            a[left] = pivot;
        }
    }

总结:

1. 当数组长度<32位时，直接进行排序

2. 当数组长度>32时，会根据计算的分片数组长度，将数组分成很小的区块，然后每个小块排序完成之后，再进行结果的合并

以上只是我个人的见解，希望批评指正!

Arrays.sort源码解析

一、Arrays.sort方法简介

二、源码解读

2.1 入口方法sort

2.2 未设置Comparator

Java1.8-Arrays源码解析

完全解析Java编程中finally语句的执行原理

学习日记1 Arrays.sort()源码解析

Arrays.sort源码解析

Android利用Gson解析嵌套多层的Json的简单方法

java实现解析二进制文件的方法(字符串、图片)

深入解析Java中的Classloader的运行机制

Arrays.asList的源码解析

深入解析Java接口(interface)的使用

Spring Boot使用FastJson解析JSON数据的方法