Define collision in hashing. What are the different methodologies to resolve collision? Explain briefly.

Purpose of hashing is to reduce searching time in an array. Without hashing searching time is O(n) whereas with hash function it takes only O(1) time. To reduce searching time Hashing technique uses Hash  function. For Example H(Key)= Key Mod 10

Where, Key is the element which we want to store in an array and 10 is the size of an array in which we want to store element. Hash function returns array index where we insert element into the array.

Lets suppose we want to insert element like 10,50,23, 45 into the array. So first we need to insert element 10 into an array.  So will apply function on this element for index position of array where we want to insert an element like H(10) = 10 Mod 10 = 10 that means element 10 will store at index 0 in an array. 

Now next element which we want ro insert in an array is 50 so apply hash function on this element also like below

H(50) = 50 Mod 10 = 0

Here again we are getting index 0 which is already filled so this situation is called in an array. So, What is collision, collision is a situation where we get same index for more than element in Hashing.

There are various method to resolve collision. In other words we can also say the process of finding an alternate location is called collision resolution. There are a number of collision resolution techniques , and the most popular are open addressing and chaining.

1) Direct chaining : It is an array of linked list application. Separate chaining is an example of direct chaining
Separate chainingIn separate chaining collision resolution, we insert linked list at array index and whatever item getting same index position we are inserted here one by one in linked list associated with respective index . Or we can say when two or more records hash to same location, these records are constituted into a singly linked list called a chain.
Ex:
Let h(x) = x mod 10
                20,100,22,3,6,56

2)Open  Addressing :  In open addressing all keys will be stored in the hash table or an array itself. So there is no need  of extra memory like chaining technique.  This procedure is based on probing. Here collision is resolved by probing technique. There are following technique to resolve collision by probing:

a)Linear probing: In linear probing interval between probes is fixed at 1. In Linear probing as name indicates , after collision we search the hash table or an array sequentially starting from the original hash location. If a location is occupied, we check the next location linearly with step 1. If necessary we wrap around from the last table location to the first table location. The function for the rehashing is the following;
rehash (key)=(n+1)%table size


Example:
h(x)=x mod 10 
20,100,22,3,6,56

Linear probing suffering problem with clustering. 

 In linear probing  ,table or an contains  contains groups of consecutively occupied locations and called as clustering. Clusters can get close to one another and merge into large cluster. Thus , one part of the table might be quite dense, even though another part has relatively few items. Clustering causes long probe searches and therefore decreases the overall efficiency. The next location to be probed is determined by the step size, where other step size other than one are possible. Clustering cannot be avoided by larger step sizes. Here θ(m) probe sequences are used.

b) Quadratic Probing : 


As name indicated instead  linearly probe the space we find space here quadratic means first find empty position at index 1 then at 2^, if again there is not empty place then will find space at position at 3^2
 and so on. It can eliminate Clustering problem.
In quadratic probing, we start from the original hash location i. If a location is occupied, we check the location i+1^2, i+2^2,i+3^2,i+4^2+ _ _ _
We wrap around from the last table location to the first table location if necessary. The function for the rehashing is as below:
rehash(key) =(n+k2)%  table size 
Example: Let us assume that the table size is 11(0..10)
Hash function: h(key) =key mod 11
31,19,2,13,25,24,21,9
31 mod 11 = 9
19 mod 11 = 8
2 mod 11= 2
13 mod 11 = 2 → 2+1^2=3
25 mod 11 = 3→3+1^2=4
25 mod 11 = 2 → 2+1^2,2+2^2=6
21 mod 11 = 10
9 mod 11 = 9→9+1^2=10
9+2^2 mod 11 =2
                        9+3^2 mod 11 = 7



Even though clustering is avoided by quadratic probing
Still there are chances of clustering . Both linear and quadratic probing use a probing sequence that is independent of search key. Here , θ(m) probe sequences are used.

c) Double hashing:

 In double hashing interval between probes is computed by another hash function. The increments for the probing sequence are computed by using a second hash function.
Double hashing reduces clustering in a better way.
The second hash function h2 should be h2(key) ≠ 0 and h2≠h1
We first probe the location h2(key).
If the location is occupied , we probe the location h1(key) + h2(key),
               h1(key)+2*h2(key),_ _ _
example: Table size is 11 (0..10)

Hash function: 
Assume h1(key)= key mod 11
             h2(key) = 7-(key mod 7)
       58,14,91,25
58 mod11 =3
14 mod 11 =3→3+7=10
91 mod 11 = 3→3+7,3+2*7 mod 11 =6
25 mod 11 = 3→3+3 ,3+2*3=9





Comments

Post a Comment