java中删除数组中重复元素方法探讨
问题:比如我有一个数组(元素个数为0哈),希望添加进去元素不能重复。
拿到这样一个问题,我可能会快速的写下代码,这里数组用arraylist.
private static void testlistset(){
list<string> arrays = new arraylist<string>(){
@override
public boolean add(string e) {
for(string str:this){
if(str.equals(e)){
system.out.println("add failed !!! duplicate element");
return false;
}else{
system.out.println("add successed !!!");
}
}
return super.add(e);
}
};
arrays.add("a");arrays.add("b");arrays.add("c");arrays.add("b");
for(string e:arrays)
system.out.print(e);
}
这里我什么都不关,只关心在数组添加元素的时候做下判断(当然添加数组元素只用add方法),是否已存在相同元素,如果数组中不存在这个元素,就添加到这个数组中,反之亦然。这样写可能简单,但是面临庞大数组时就显得笨拙:有100000元素的数组天家一个元素,难道要调用100000次equal吗?这里是个基础。
问题:加入已经有一些元素的数组了,怎么删除这个数组里重复的元素呢?
大家知道java中集合总的可以分为两大类:list与set。list类的集合里元素要求有序但可以重复,而set类的集合里元素要求无序但不能重复。那么这里就可以考虑利用set这个特性把重复元素删除不就达到目的了,毕竟用系统里已有的算法要优于自己现写的算法吧。
public static void removeduplicate(list<people> list){
hashset<people> set = new hashset<people>(list);
list.clear();
list.addall(set);
} private static people[] objdata = new people[]{
new people(0, "a"),new people(1, "b"),new people(0, "a"),new people(2, "a"),new people(3, "c"),
};
public class people{
private int id;
private string name;
public people(int id,string name){
this.id = id;
this.name = name;
}
@override
public string tostring() {
return ("id = "+id+" , name "+name);
}
}
上面的代码,用了一个自定义的people类,当我添加相同的对象时候(指的是含有相同的数据内容),调用removeduplicate方法发现这样并不能解决实际问题,仍然存在相同的对象。那么hashset里是怎么判断像个对象是否相同的呢?打开hashset源码可以发现:每次往里面添加数据的时候,就必须要调用add方法:
@override
public boolean add(e object) {
return backingmap.put(object, this) == null;
}
这里的backingmap也就是hashset维护的数据,它用了一个很巧妙的方法,把每次添加的object当作hashmap里面的key,本身hashset对象当作value。这样就利用了hashmap里的key唯一性,自然而然的hashset的数据不会重复。但是真正的是否有重复数据,就得看hashmap里的怎么判断两个key是否相同。
@override public v put(k key, v value) {
if (key == null) {
return putvaluefornullkey(value);
}
int hash = secondaryhash(key.hashcode());
hashmapentry<k, v>[] tab = table;
int index = hash & (tab.length - 1);
for (hashmapentry<k, v> e = tab[index]; e != null; e = e.next) {
if (e.hash == hash && key.equals(e.key)) {
premodify(e);
v oldvalue = e.value;
e.value = value;
return oldvalue;
}
}
// no entry for (non-null) key is present; create one
modcount++;
if (size++ > threshold) {
tab = doublecapacity();
index = hash & (tab.length - 1);
}
addnewentry(key, value, hash, index);
return null;
}
总的来说,这里实现的思路是:遍历hashmap里的元素,如果元素的hashcode相等(事实上还要对hashcode做一次处理),然后去判断key的eqaul方法。如果这两个条件满足,那么就是不同元素。那这里如果数组里的元素类型是自定义的话,要利用set的机制,那就得自己实现equal与hashmap(这里hashmap算法就不详细介绍了,我也就理解一点)方法了:
public class people{
private int id; //
private string name;
public people(int id,string name){
this.id = id;
this.name = name;
}
@override
public string tostring() {
return ("id = "+id+" , name "+name);
}
public int getid() {
return id;
}
public void setid(int id) {
this.id = id;
}
public string getname() {
return name;
}
public void setname(string name) {
this.name = name;
}
@override
public boolean equals(object obj) {
if(!(obj instanceof people))
return false;
people o = (people)obj;
if(id == o.getid()&&name.equals(o.getname()))
return true;
else
return false;
}
@override
public int hashcode() {
// todo auto-generated method stub
return id;
//return super.hashcode();
}
}
这里在调用removeduplicate(list)方法就不会出现两个相同的people了。
好吧,这里就测试它们的性能吧:
public class removedeplicate {
public static void main(string[] args) {
// todo auto-generated method stub
//testlistset();
//removeduplicatewithorder(arrays.aslist(data));
//arraylist<people> list = new arraylist<people>(arrays.aslist(objdata));
//removeduplicate(list);
people[] data = createobjectarray(10000);
arraylist<people> list = new arraylist<people>(arrays.aslist(data));
long starttime1 = system.currenttimemillis();
system.out.println("set start time --> "+starttime1);
removeduplicate(list);
long endtime1 = system.currenttimemillis();
system.out.println("set end time --> "+endtime1);
system.out.println("set total time --> "+(endtime1-starttime1));
system.out.println("count : " + people.count);
people.count = 0;
long starttime = system.currenttimemillis();
system.out.println("efficient start time --> "+starttime);
efficientremovedup(data);
long endtime = system.currenttimemillis();
system.out.println("efficient end time --> "+endtime);
system.out.println("efficient total time --> "+(endtime-starttime));
system.out.println("count : " + people.count);
}
public static void removeduplicate(list<people> list)
{
hashset<people> set = new hashset<people>(list);
list.clear();
list.addall(set);
}
public static void removeduplicatewithorder(list<string> arllist)
{
set<string> set = new hashset<string>();
list<string> newlist = new arraylist<string>();
for (iterator<string> iter = arllist.iterator(); iter.hasnext();) {
string element = iter.next();
if (set.add( element))
newlist.add( element);
}
arllist.clear();
arllist.addall(newlist);
}
@suppresswarnings("serial")
private static void testlistset(){
list<string> arrays = new arraylist<string>(){
@override
public boolean add(string e) {
for(string str:this){
if(str.equals(e)){
system.out.println("add failed !!! duplicate element");
return false;
}else{
system.out.println("add successed !!!");
}
}
return super.add(e);
}
};
arrays.add("a");arrays.add("b");arrays.add("c");arrays.add("b");
for(string e:arrays)
system.out.print(e);
}
private static void efficientremovedup(people[] peoples){
//object[] originalarray; // again, pretend this contains our original data
int count =0;
// new temporary array to hold non-duplicate data
people[] newarray = new people[peoples.length];
// current index in the new array (also the number of non-dup elements)
int currentindex = 0;
// loop through the original array...
for (int i = 0; i < peoples.length; ++i) {
// contains => true iff newarray contains originalarray[i]
boolean contains = false;
// search through newarray to see if it contains an element equal
// to the element in originalarray[i]
for(int j = 0; j <= currentindex; ++j) {
// if the same element is found, don't add it to the new array
count++;
if(peoples[i].equals(newarray[j])) {
contains = true;
break;
}
}
// if we didn't find a duplicate, add the new element to the new array
if(!contains) {
// note: you may want to use a copy constructor, or a .clone()
// here if the situation warrants more than a shallow copy
newarray[currentindex] = peoples[i];
++currentindex;
}
}
system.out.println("efficient medthod inner count : "+ count);
}
private static people[] createobjectarray(int length){
int num = length;
people[] data = new people[num];
random random = new random();
for(int i = 0;i<num;i++){
int id = random.nextint(10000);
system.out.print(id + " ");
data[i]=new people(id, "i am a man");
}
return data;
}
}
测试结果:
set end time --> 1326443326724
set total time --> 26
count : 3653
efficient start time --> 1326443326729
efficient medthod inner count : 28463252
efficient end time --> 1326443327107
efficient total time --> 378
count : 28463252