java中删除数组中重复元素方法探讨

程序员文章站 2023-12-18 17:23:16

问题：比如我有一个数组（元素个数为0哈），希望添加进去元素不能重复。　　拿到这样一个问题，我可能会快速的写下代码，这里数组用arraylist.复制代码代码如下:pr...

问题：比如我有一个数组（元素个数为0哈），希望添加进去元素不能重复。

　　拿到这样一个问题，我可能会快速的写下代码，这里数组用arraylist.

private static void testlistset(){
        list<string> arrays = new arraylist<string>(){
            @override
            public boolean add(string e) {
                for(string str:this){
                    if(str.equals(e)){
                        system.out.println("add failed !!!  duplicate element");
                        return false;
                    }else{
                        system.out.println("add successed !!!");
                    }
                }
                return super.add(e);
            }
        };

        arrays.add("a");arrays.add("b");arrays.add("c");arrays.add("b");
        for(string e:arrays)
            system.out.print(e);
    }

这里我什么都不关，只关心在数组添加元素的时候做下判断（当然添加数组元素只用add方法），是否已存在相同元素，如果数组中不存在这个元素，就添加到这个数组中，反之亦然。这样写可能简单，但是面临庞大数组时就显得笨拙：有100000元素的数组天家一个元素，难道要调用100000次equal吗？这里是个基础。

问题：加入已经有一些元素的数组了，怎么删除这个数组里重复的元素呢？

　　大家知道java中集合总的可以分为两大类：list与set。list类的集合里元素要求有序但可以重复，而set类的集合里元素要求无序但不能重复。那么这里就可以考虑利用set这个特性把重复元素删除不就达到目的了，毕竟用系统里已有的算法要优于自己现写的算法吧。

复制代码代码如下:

public static void removeduplicate(list<people> list){
       hashset<people> set = new hashset<people>(list);
       list.clear();
       list.addall(set);
    }　　private static people[] objdata = new people[]{
        new people(0, "a"),new people(1, "b"),new people(0, "a"),new people(2, "a"),new people(3, "c"),
    };　

复制代码代码如下:

public class people{
    private int id;
    private string name;

    public people(int id,string name){
        this.id = id;
        this.name = name;
    }

    @override
    public string tostring() {
        return ("id = "+id+" , name "+name);
    }    
}

上面的代码，用了一个自定义的people类，当我添加相同的对象时候（指的是含有相同的数据内容），调用removeduplicate方法发现这样并不能解决实际问题，仍然存在相同的对象。那么hashset里是怎么判断像个对象是否相同的呢？打开hashset源码可以发现：每次往里面添加数据的时候，就必须要调用add方法：

复制代码代码如下:

@override 
     public boolean add(e object) { 
         return backingmap.put(object, this) == null; 
     }

这里的backingmap也就是hashset维护的数据，它用了一个很巧妙的方法，把每次添加的object当作hashmap里面的key，本身hashset对象当作value。这样就利用了hashmap里的key唯一性，自然而然的hashset的数据不会重复。但是真正的是否有重复数据，就得看hashmap里的怎么判断两个key是否相同。

复制代码代码如下:

@override public v put(k key, v value) {
        if (key == null) {
            return putvaluefornullkey(value);
        }

        int hash = secondaryhash(key.hashcode());
        hashmapentry<k, v>[] tab = table;
        int index = hash & (tab.length - 1);
        for (hashmapentry<k, v> e = tab[index]; e != null; e = e.next) {
            if (e.hash == hash && key.equals(e.key)) {
                premodify(e);
                v oldvalue = e.value;
                e.value = value;
                return oldvalue;
            }
        }

        // no entry for (non-null) key is present; create one
        modcount++;
        if (size++ > threshold) {
            tab = doublecapacity();
            index = hash & (tab.length - 1);
        }
        addnewentry(key, value, hash, index);
        return null;
    }

总的来说，这里实现的思路是：遍历hashmap里的元素，如果元素的hashcode相等（事实上还要对hashcode做一次处理），然后去判断key的eqaul方法。如果这两个条件满足，那么就是不同元素。那这里如果数组里的元素类型是自定义的话，要利用set的机制，那就得自己实现equal与hashmap（这里hashmap算法就不详细介绍了，我也就理解一点）方法了：

复制代码代码如下:

public class people{
    private int id; //
    private string name;

    public people(int id,string name){
        this.id = id;
        this.name = name;
    }

    @override
    public string tostring() {
        return ("id = "+id+" , name "+name);
    }

    public int getid() {
        return id;
    }

    public void setid(int id) {
        this.id = id;
    }

    public string getname() {
        return name;
    }

    public void setname(string name) {
        this.name = name;
    }

    @override
    public boolean equals(object obj) {
        if(!(obj instanceof people))
            return false;
        people o = (people)obj;
        if(id == o.getid()&&name.equals(o.getname()))
            return true;
        else
            return false;
    }

    @override
    public int hashcode() {
        // todo auto-generated method stub
        return id;
        //return super.hashcode();
    }
}

这里在调用removeduplicate(list)方法就不会出现两个相同的people了。

好吧，这里就测试它们的性能吧：

复制代码代码如下:

public class removedeplicate {

    public static void main(string[] args) {
        // todo auto-generated method stub
        //testlistset();
        //removeduplicatewithorder(arrays.aslist(data));
        //arraylist<people> list = new arraylist<people>(arrays.aslist(objdata));

        //removeduplicate(list);

        people[] data = createobjectarray(10000);
        arraylist<people> list = new arraylist<people>(arrays.aslist(data));

        long starttime1 = system.currenttimemillis();
        system.out.println("set start time --> "+starttime1);
        removeduplicate(list);
        long endtime1 = system.currenttimemillis();
        system.out.println("set end time --> "+endtime1);
        system.out.println("set total time --> "+(endtime1-starttime1));
        system.out.println("count : " + people.count);
        people.count = 0;

        long starttime = system.currenttimemillis();
        system.out.println("efficient start time --> "+starttime);
        efficientremovedup(data);
        long endtime = system.currenttimemillis();
        system.out.println("efficient end time --> "+endtime);
        system.out.println("efficient total time --> "+(endtime-starttime));
        system.out.println("count : " + people.count);

    }
    public static void removeduplicate(list<people> list)
    {
     hashset<people> set = new hashset<people>(list);
     list.clear();
     list.addall(set);
    }

    public static void removeduplicatewithorder(list<string> arllist)
    {
       set<string> set = new hashset<string>();
       list<string> newlist = new arraylist<string>();
       for (iterator<string> iter = arllist.iterator(); iter.hasnext();) {
          string element = iter.next();
          if (set.add( element))
             newlist.add( element);
       }
       arllist.clear();
       arllist.addall(newlist);
    }


    @suppresswarnings("serial")
    private static void testlistset(){
        list<string> arrays = new arraylist<string>(){
            @override
            public boolean add(string e) {
                for(string str:this){
                    if(str.equals(e)){
                        system.out.println("add failed !!! duplicate element");
                        return false;
                    }else{
                        system.out.println("add successed !!!");
                    }
                }
                return super.add(e);
            }
        };

        arrays.add("a");arrays.add("b");arrays.add("c");arrays.add("b");
        for(string e:arrays)
            system.out.print(e);
    }

    private static void efficientremovedup(people[] peoples){
        //object[] originalarray; // again, pretend this contains our original data
        int count =0;
        // new temporary array to hold non-duplicate data
        people[] newarray = new people[peoples.length];
        // current index in the new array (also the number of non-dup elements)
        int currentindex = 0;

        // loop through the original array...
        for (int i = 0; i < peoples.length; ++i) {
            // contains => true iff newarray contains originalarray[i]
            boolean contains = false;

            // search through newarray to see if it contains an element equal
            // to the element in originalarray[i]
            for(int j = 0; j <= currentindex; ++j) {
                // if the same element is found, don't add it to the new array
                count++;
                if(peoples[i].equals(newarray[j])) {

                    contains = true;
                    break;
                }
            }

            // if we didn't find a duplicate, add the new element to the new array
            if(!contains) {
                // note: you may want to use a copy constructor, or a .clone()
                // here if the situation warrants more than a shallow copy
                newarray[currentindex] = peoples[i];
                ++currentindex;
            }
        }

        system.out.println("efficient medthod inner count : "+ count);

    }

    private static people[] createobjectarray(int length){
        int num = length;
        people[] data = new people[num];
        random random = new random();
        for(int i = 0;i<num;i++){
            int id = random.nextint(10000);
            system.out.print(id + " ");
            data[i]=new people(id, "i am a man");
        }
        return data;
    }
｝

测试结果：

复制代码代码如下:

set end time -->  1326443326724
set total time -->  26
count : 3653
efficient start time --> 1326443326729
efficient medthod inner  count : 28463252
efficient end time -->  1326443327107
efficient total time -->  378
count : 28463252

java中删除数组中重复元素方法探讨

java中删除数组中重复元素方法探讨

MySQL数据库中删除重复记录的方法总结[推荐]

Java替换int数组中重复数据的方法示例

删除Table表中的重复行的方法

java使用分隔符连接数组中每个元素的实例

C#把数组中的某个元素取出来放到第一个位置的实现方法

教你几种在SQLServer中删除重复数据方法

C#把数组中的某个元素取出来放到第一个位置的实现方法

java中数组的应用及方法

C#删除字符串中重复字符的方法