欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Data Structures notes - Hashing

程序员文章站 2024-03-17 23:14:28
...

a. basic knowledge review:   The bold content is what I think is important

    Hashing, is the implementation of hash tables,It  is  a  technique used for performing insertions,deletions, and finds  in  O(1). but nothing is perfect,  any Tree operations that require any ordering information among the elements are not supported efficiently(such as  find the minmum elements and maximum elements )  and the printing of the entire table in sorted order in linear time are not supported. hashing technique including the following: see several methods of implementing the hash table and compare these methods analytically, the applications of hashing, compare hash tables with binary search trees.

     the ideal hash table data structure is merely an array of some fixed size containing the items, generally a search is performed on some part (data member) of item, this is called the "key".  Each key is mapped into some number in the range [0,tablesize-1] ,and  placed in the appraopriate. The mapping is called a "hash function" while ideally should be simple to compute and should ensure that any two distinct keys get different cells.  hash function that distributes the keys evenly among the cells. Deciding what to do when two keys hash to the same value(collision) and deciding on the table size is the remaining problems.

If the input keys are integers, it is often a good idea to ensure that the table size is prime. when the input keys are random integers, then this function is not noly ver simple to compute but also distributes the keys evenly.  in other situation, the keys are strings, one option is to add up the ASCII values of the characters in the string. like the following code:

int hash(const string & key, int tableSize)
{
   int hashVal=0; 
   for(char ch:key) hashVal+=ch; 
   return hashVal%tableSize;
}

   Another hash function assumes that key has at least three characters, the value 27 represents the number of letters in the English alphabet,plus the blank, and 729 is the square of 27, this function examines only the first three characters:

int hash(const string& key, int tableSize)
{
    return (key[0]+27*key[1]+729*key[2])%tableSize;
}

     but if these are random and the table size is 10007, as before, the we would expect a reasonably equitable distribution. unfortunately, this function is not appropriate if the hash table is reasonably large.

here is a good hash function:

unsigned int hash(const string &key, int tableSize)
{
   unsigned int hashVal=0;
    for(char ch : key)
        hashVal=37 * hashVal + ch;

    return hashVal % tableSize;
}

The hash function takes advantage of the fact that overflow is allowed and uses unsigned int to avoid introducing a negative number.