深入理解Java中HashCode方法

程序员文章站 2023-12-18 22:04:58

关于hashcode，*中： in the java programming language, every class implicitly or exp...

关于hashcode，*中：

in the java programming language, every class implicitly or explicitly 
provides a hashcode() method, which digests the data stored in an 
instance of the class into a single hash value (a 32-bit signed 
integer).

hashcode就是根据存储在一个对象实例中的所有数据，提取出一个32位的整数，该整数的目的是用来标示该实例的唯一性。有点类似于md5码，每个文件都能通过md5算法生成一个唯一的md5码。不过，java中的hashcode并没有真正的实现为每个对象生成一个唯一的hashcode，还是会有一定的重复几率。

先来看看object类，我们知道，object类是java程序中所有类的直接或间接父类，处于类层次的最高点。在object类里定义了很多我们常见的方法，包括我们要讲的hashcode方法，如下

public final native class<?> getclass(); 
public native int hashcode(); 
public boolean equals(object obj) { 
 return (this == obj); 
}  
public string tostring() { 
 return getclass().getname() + "@" + integer.tohexstring(hashcode()); 
}

注意到hashcode方法前面有个native的修饰符，这表示hashcode方法是由非java语言实现的，具体的方法实现在外部，返回内存对象的地址。

在java的很多类中都会重写equals和hashcode方法，这是为什么呢？最常见的string类，比如我定义两个字符相同的字符串，那么对它们进行比较时，我想要的结果应该是相等的，如果你不重写equals和hashcode方法，他们肯定是不会相等的，因为两个对象的内存地址不一样。

public int hashcode() { 
  int h = hash; 
  if (h == 0) { 
    int off = offset; 
    char val[] = value; 
    int len = count; 

      for (int i = 0; i < len; i++) { 
        h = 31*h + val[off++]; 
      } 
      hash = h; 
    } 
    return h; 
  }

其实这段代码是这个数学表达式的实现

s[0]*31^(n-1) + s[1]*31^(n-2) + … + s[n-1]

s[i]是string的第i个字符，n是string的长度。那为什么这里用31，而不是其它数呢?《effective java》是这样说的：之所以选择31，是因为它是个奇素数，如果乘数是偶数，并且乘法溢出的话，信息就会丢失，因为与2相乘等价于移位运算。使用素数的好处并不是很明显，但是习惯上都使用素数来计算散列结果。31有个很好的特性，就是用移位和减法来代替乘法，可以得到更好的性能：31*i==(i<<5)-i。现在的vm可以自动完成这种优化。

可以看到，string类是用它的value值作为参数来计算hashcode的，也就是说，相同的value就一定会有相同的hashcode值。这点也很容易理解，因为value值相同，那么用equals比较也是相等的，equals方法比较相等，则hashcode一定相等。反过来不一定成立。它不保证相同的hashcode一定有相同的对象。

一个好的hash函数应该是这样的：为不相同的对象产生不相等的hashcode。

在理想情况下，hash函数应该把集合中不相等的实例均匀分布到所有可能的hashcode上，要想达到这种理想情形是非常困难的，至少java没有达到。因为我们可以看到，hashcode是非随机生成的，它有一定的规律，就是上面的数学等式，我们可以构造一些具有相同hashcode但value值不一样的，比如说：aa和bb的hashcode是一样的。

如下代码：

public class main {
  public static void main(string[] args) {
    main m = new main();
    system.out.println(m);
    system.out.println(integer.tohexstring(m.hashcode()));
    string a = "aa";
    string b = "bb";
    system.out.println(a.hashcode());
    system.out.println(b.hashcode());
  }
}

输出结果：

main@2a139a55 
2a139a55 
2112 
2112

一般在重写equal函数时，也要重写hashcode函数，这是为什么呢？

来看看这个例子，让我们创建一个简单的类employee

public class employee
{
  private integer id;
  private string firstname;
  private string lastname;
  private string department;

  public integer getid() {
    return id;
  }
  public void setid(integer id) {
    this.id = id;
  }
  public string getfirstname() {
    return firstname;
  }
  public void setfirstname(string firstname) {
    this.firstname = firstname;
  }
  public string getlastname() {
    return lastname;
  }
  public void setlastname(string lastname) {
    this.lastname = lastname;
  }
  public string getdepartment() {
    return department;
  }
  public void setdepartment(string department) {
    this.department = department;
  }
}

上面的employee类只是有一些非常基础的属性和getter、setter.现在来考虑一个你需要比较两个employee的情形。

public class equalstest {
  public static void main(string[] args) {
    employee e1 = new employee();
    employee e2 = new employee();

    e1.setid(100);
    e2.setid(100);
    //prints false in console
    system.out.println(e1.equals(e2));
  }
}

毫无疑问，上面的程序将输出false，但是，事实上上面两个对象代表的是通过一个employee。真正的商业逻辑希望我们返回true。

为了达到这个目的，我们需要重写equals方法。

public boolean equals(object o) {
    if(o == null)
    {
      return false;
    }
    if (o == this)
    {
      return true;
    }
    if (getclass() != o.getclass())
    {
      return false;
    }
    employee e = (employee) o;
    return (this.getid() == e.getid());
}

在上面的类中添加这个方法，eauqlstest将会输出true。

so are we done?没有，让我们换一种测试方法来看看。

import java.util.hashset;
import java.util.set;
public class equalstest
{
	public static void main(string[] args)
	  {
		employee e1 = new employee();
		employee e2 = new employee();
		e1.setid(100);
		e2.setid(100);
		//prints 'true'
		system.out.println(e1.equals(e2));
		set<employee> employees = new hashset<employee>();
		employees.add(e1);
		employees.add(e2);
		//prints two objects
		system.out.println(employees);
	}

上面的程序输出的结果是两个。如果两个employee对象equals返回true，set中应该只存储一个对象才对，问题在哪里呢？

我们忘掉了第二个重要的方法hashcode()。就像jdk的javadoc中所说的一样，如果重写equals()方法必须要重写hashcode()方法。我们加上下面这个方法，程序将执行正确。

@override
 public int hashcode()
 {
  final int prime = 31;
  int result = 1;
  result = prime * result + getid();
  return result;
 }

需要注意记住的事情

尽量保证使用对象的同一个属性来生成hashcode()和equals()两个方法。在我们的案例中,我们使用员工id。
eqauls方法必须保证一致（如果对象没有被修改，equals应该返回相同的值）
任何时候只要a.equals(b),那么a.hashcode()必须和b.hashcode()相等。
两者必须同时重写。

总结

以上就是本文关于深入理解java中hashcode方法的全部内容，希望对大家有所帮助。感兴趣的朋友可以继续参阅本站其他相关专题，如有不足之处，欢迎留言指出。感谢朋友们对本站的支持！

深入理解Java中HashCode方法

深入理解Java中HashCode方法

深入解析java中的静态代理与动态代理

深入分析java并发编程中volatile的实现原理

深入理解Java class文件格式_动力节点Java学院整理

java中读取配置文件中数据的具体方法

在Java代码中解析html,获取其中的值方法

java中删除数组中重复元素方法探讨

Java中Arrays.asList()方法详解及实例

深入探讨JAVA中的异常与错误处理

深入理解Java IO的flush