Java中 shuffle 算法的使用
fisher–yates shuffle 基本思想(knuth shuffle ):
to shuffle an array a of n elements (indices 0..n-1):
for i from n − 1 downto 1 do
j ← random integer with 0 ≤ j ≤ i
exchange a[j] and a[i]
jdk源代码如下:
/**
* moves every element of the list to a random new position in the list.
*
* @param list
* the list to shuffle
*
* @throws unsupportedoperationexception
* when replacing an element in the list is not supported
*/
public static void shuffle(list<?> list) {
shuffle(list, new random());
}
/**
* moves every element of the list to a random new position in the list
* using the specified random number generator.
*
* @param list
* the list to shuffle
* @param random
* the random number generator
*
* @throws unsupportedoperationexception
* when replacing an element in the list is not supported
*/
@suppresswarnings("unchecked")
public static void shuffle(list<?> list, random random) {
if (!(list instanceof randomaccess)) {
object[] array = list.toarray();
for (int i = array.length - 1; i > 0; i--) {
int index = random.nextint(i + 1);
if (index < 0) {
index = -index;
}
object temp = array[i];
array[i] = array[index];
array[index] = temp;
}
int i = 0;
listiterator<object> it = (listiterator<object>) list
.listiterator();
while (it.hasnext()) {
it.next();
it.set(array[i++]);
}
} else {
list<object> rawlist = (list<object>) list;
for (int i = rawlist.size() - 1; i > 0; i--) {
int index = random.nextint(i + 1);
if (index < 0) {
index = -index;
}
rawlist.set(index, rawlist.set(i, rawlist.get(index)));
}
}
}
测试代码,为了确保每种情况的初始化一样,使用了多个容器:
public class javashuffle {
public static int temp = 0;
public static long start;
public static long end;
public static void main(final string args[]) {
object changetemp;
list<integer> numlist = new arraylist<integer>();
list<integer> firstlist = new arraylist<integer>();
list<integer> secondlist = new arraylist<integer>();
list<integer> thirdlist = new arraylist<integer>();
list<integer> fourthlist = new arraylist<integer>();
for (int i = 1; i <= 100000; i++) {
numlist.add(i);
firstlist.add(i);
secondlist.add(i);
thirdlist.add(i);
fourthlist.add(i);
}
// first shuffle,use changetemp
getstarttime();
int randint = 0;
for (int i = 0, length = firstlist.size(); i < length; i++) {
randint = getrandom(i, firstlist.size());
changetemp = firstlist.get(i);
firstlist.set(i, firstlist.get(randint));
firstlist.set(randint, javashuffle.temp);
}
getendtime("first shuffle run time ");
// second shuffle,exchange list
getstarttime();
for (int i = 0, length = secondlist.size(); i < length; i++) {
randint = getrandom(i, secondlist.size());
secondlist.set(i, secondlist.set(randint, secondlist.get(i)));
}
getendtime("second shuffle run time");
// third shuffle, change generate random int
getstarttime();
object[] temparray = thirdlist.toarray();
random rand = new random();
int j = 0;
for (int i = temparray.length - 1; i > 0; i--) {
j = rand.nextint(i + 1);
thirdlist.set(i, thirdlist.set(j, thirdlist.get(i)));
}
getendtime("third shuffle run time ");
// fourth shuffle, simulate java shuffle
getstarttime();
random random = new random();
if (!(fourthlist instanceof randomaccess)) {
object[] array = fourthlist.toarray();
for (int i = array.length - 1; i > 0; i--) {
int index = random.nextint(i + 1);
if (index < 0) {
index = -index;
}
object temp = array[i];
array[i] = array[index];
array[index] = temp;
}
int i = 0;
listiterator<integer> it = (listiterator<integer>) fourthlist.listiterator();
while (it.hasnext()) {
it.next();
it.set((integer) array[i++]);
}
} else {
list<integer> rawlist = (list<integer>) fourthlist;
for (int i = rawlist.size() - 1; i > 0; i--) {
int index = random.nextint(i + 1);
if (index < 0) {
index = -index;
}
rawlist.set(index, rawlist.set(i, rawlist.get(index)));
}
}
getendtime("fourth shuffle run time");
// java shuffle
getstarttime();
collections.shuffle(numlist);
getendtime("java shuffle run time ");
}
public static void swap(int a, int b) {
javashuffle.temp = a;
a = b;
b = javashuffle.temp;
}
public static int getrandom(final int low, final int high) {
return (int) (math.random() * (high - low) + low);
}
public static void getstarttime() {
javashuffle.start = system.nanotime();
}
public static void getendtime(final string s) {
javashuffle.end = system.nanotime();
system.out.println(s + ": " + (javashuffle.end - javashuffle.start) + "ns");
}
}
如果数值较小,例如100000级别,则输出大概是:
first shuffle run time : 85029499ns
second shuffle run time: 80909474ns
third shuffle run time : 71543926ns
fourth shuffle run time: 76520595ns
java shuffle run time : 61027643ns
first shuffle run time : 82326239ns
second shuffle run time: 78575611ns
third shuffle run time : 95009632ns
fourth shuffle run time: 105946897ns
java shuffle run time : 90849302ns
first shuffle run time : 84539840ns
second shuffle run time: 85965575ns
third shuffle run time : 101814998ns
fourth shuffle run time: 113309672ns
java shuffle run time : 35089693ns
first shuffle run time : 87679863ns
second shuffle run time: 79991814ns
third shuffle run time : 73720515ns
fourth shuffle run time: 78353061ns
java shuffle run time : 64146465ns
first shuffle run time : 84314386ns
second shuffle run time: 80074803ns
third shuffle run time : 74001283ns
fourth shuffle run time: 79931321ns
java shuffle run time : 86427540ns
first shuffle run time : 84315523ns
second shuffle run time: 81468386ns
third shuffle run time : 75052284ns
fourth shuffle run time: 79461407ns
java shuffle run time : 66607729ns
多次运行结果可能都不一样,但是基本java自带 shuffle速度最快,第三种方式次之。而第一种方法耗时最久。
如果是10000000级别,大概如下:
first shuffle run time : 2115703288ns
second shuffle run time: 3114045871ns
third shuffle run time : 4664426798ns
fourth shuffle run time: 2962686695ns
java shuffle run time : 3246883026ns first shuffle run time : 2165398466ns
second shuffle run time: 3129558913ns
third shuffle run time : 4147859664ns
fourth shuffle run time: 2911849942ns
java shuffle run time : 4311703487ns first shuffle run time : 2227462247ns
second shuffle run time: 3279548770ns
third shuffle run time : 4704344954ns
fourth shuffle run time: 2942635980ns
java shuffle run time : 3933172427ns first shuffle run time : 2200158789ns
second shuffle run time: 3172666791ns
third shuffle run time : 4715631517ns
fourth shuffle run time: 2950817535ns
java shuffle run time : 3387417676ns first shuffle run time : 2201124449ns
second shuffle run time: 3203823874ns
third shuffle run time : 4179926278ns
fourth shuffle run time: 2913690411ns
java shuffle run time : 3571313813ns first shuffle run time : 2163053190ns
second shuffle run time: 3073889926ns
third shuffle run time : 4493831518ns
fourth shuffle run time: 2852713887ns
java shuffle run time : 3773602415ns
可以看出,第一种方法速度最快,而第四种最慢。java自带 shuffle速度也不理想。
在进行大数据处理的时候,如果使用java库效率较低时,可以考虑使用其他方式。