Friday, June 25, 2010

HashMap Internal


I always wanted to implement the logic behind the famous data structure of Java, HashMap and here it comes. Though it’s not the complete implementation considering all optimization scenarios, but it will highlight the basic algorithm behind the HashMap.

So let’s start, this implementation will use my LinkedList implementation (reason: for this implementation I thought to write everything on scratch apart from primitive data types. Sounds weird? May be ;-) ). You may refer my earlier post on LinkedList, as I’m going to use it.

HashMap is a key-value pair data structure, where you retrieve value by passing key. To optimize the search of key, a concept of Bucket (an Array of Java Objects) has been introduced. This is used, so that if search hits this Bucket, corresponding value element is instantly retrieved, otherwise it iterates sequentially through the Linked List associated with each Bucket element. So you can say if all HashMap elements get fit into the Bucket, retrieving elements back, will be super fast and this is the Best Case. But that also means you are keeping everything in an array which is not good in-terms of heap memory storage limitation. So we need to consider both the things Bucket and LinkedList. This is really interesting, so let me know, if you guys want to discuss more on this.

To make sure that all Key_Index (Hashcode of Key Object, calculated by calling Object’s hashCode method) becomes less than Bucket_Size, we will do the mod operation i.e.
Key_Index = key.hashCode() % Bucket_Size,
This returns integer value which is always less than Bucket_Size. If multiple key Objects return same Key_Index, then all matching keys will be stored in the associated LinkedList of the Bucket element. So if the bucket size is 10, following figure depicts the behavior of element entry into HashMap-

Bucket: [K0][K1][K2][K3][K4]......[K9]
        [V0][V0][V0][V0][V0]
        [V1]    [V1]
        [V2]
 K0, K1 etc. are elements stored in the bucket, where as V0, V1, etc. are elements stored in LinkedList of corresponding bucket elements. So, 3 elements stored in K0’s LinkedList, 1 element stored in K1’s LinkedList.

Structure of HashMap Node Object

import com.ds.link.LinkedList;

public class HashNode {

      /**
       * HashMap Key
       */
      Object key;
      /**
       * HashMap Value
       */
      Object value;
      /**
       * List of HashMap nodes which doesn't get fit into bucket
       */
      LinkedList list;
     
      /**
       * by default the list will be null
       */
      public void initList(){
            if(list == null)
                  list = new LinkedList();
      }
}

MyHashMap.java

First thing first, define the Bucket (an Object Array)-
final int bucket_size = 10;
Object[] bucket;
/**
  * @param bucket_size- custom bucket size
  */
public MyHashMap(int bucket_size){
      this.bucket_size = bucket_size;
      bucket = new Object[bucket_size];
}
     
/**
  * default bucket_size
  */
public MyHashMap(){
      bucket = new Object[bucket_size];
}

Add Element
/**
  * Add key, value pair in the HashMap
  * @param key
  * @param value
  */
public void add(Object key, Object value){
      int kCode = key.hashCode();
      int index = kCode % bucket_size;
      System.out.println("->"+ index);
           
      if(bucket[index] == null){
            System.out.println("First entry in Bucket: Key="+ key);
            HashNode node = new HashNode();
            node.key = key;
            node.value = value;
            bucket[index] = node;
      }
      else{
            // value already exists, so add this element into LinkedList
            HashNode node = (HashNode)bucket[index];
            System.out.println("Add into List of: "+ node.key +" Key="+ key);
            // initiate the list
            node.initList();
            HashNode newNode = new HashNode();
            newNode.key = key;
            newNode.value = value;
            // add the new element into list
            node.list.addToEnd(newNode);
      }
}

Get Element
/**
  * If the key is not present, it will return null
  * @param key
  * @return
  */
public Object get(Object key){
           
      int index = key.hashCode() % bucket_size;
      System.out.println("Searching for Bucket Index: "+ index);
      if(index >= bucket.length)
            new ArrayIndexOutOfBoundsException("Invalid Attempt: Index="+ index);
           
      HashNode node = (HashNode)bucket[index];
           
      if(node == null){
            // At this position, there is no node in the bucket
            return null;
      }
           
      if(node.key.hashCode() == key.hashCode()){
            // element is found in the bucket
            System.out.println("1st Level Match....");
            return node.value;
      }
           
      // unable to find in the element in the bucket
      // find in the associated list
      LinkedList list = node.list;
           
      if(list == null){
            System.out.println("No associated list");
            return null;
      }
           
      // pointer to the root node of the linkedlist
      Node first = list.getFirst();
           
      while(true){
            if(first == null)
                  break;
            HashNode hn = (HashNode)first.data;
            if(key.hashCode() == hn.key.hashCode()){
                  System.out.println("2nd Level Match Found...");
                  return hn.value;
            }
            first = first.pointer;
      }
           
      // still unable to find the element
      System.out.println("Failed to hit bulls eye!! No match found");
      return null;
}

12 comments:

  1. Hi Prasanta,

    Nice effort to understand the working of a HashMap.

    Just to add on to the above algorithm :

    Java HashMap supports null keys, and null doesn't have a hash code

    So to deal with it and prevent NPE, it just defaults to 0th index if key is null.
    That means whenever key is null..bucket 0 is selected and the node is added to the corresponding list.
    Similarly while fetching the value of null key , bucket 0 is searched for node whose key is null.

    ~Sachin

    ReplyDelete
  2. Very Nice Prasanta.
    I have 2 observations.
    bucket_size can not be final & in the add method
    node.initList() should be done following a null check .
    if(node.list == null){ node.initList() }

    ReplyDelete
  3. Wonderful, too good !

    ReplyDelete
  4. Thanks, it was helpful, Could you add details on Resizing and Race conditions too ?

    ReplyDelete
  5. Excellent explanation,I am still confuse how it handles null key and why we require to add this feature

    ReplyDelete
  6. Nice explanation. Keep sharing about collections and Threads. Thanks in advance

    ReplyDelete
  7. Awesome...!

    By
    Vasan Rajendran

    ReplyDelete
  8. Good Explanation and thanks!
    but i am confused in the following code of lines:
    if(node.key.hashCode() == key.hashCode()){
    // element is found in the bucket
    System.out.println("1st Level Match....");
    return node.value;
    }

    Here we are just matching the key hashcode and according to that we are returning value but If more that one keys returned the same hashcode then then what will be returned/

    For example if our key was AB and hascode return its length that is bucket is 2 now suppose if we search with key CD then again hashcode is 2 and according to above line of code it will return the value of associated with key AB.

    Please let me know

    ReplyDelete
  9. Nice One...
    But 1 Observation in 2nd Level Match in place of
    if(key.hashCode() == hn.key.hashCode()){ It should be
    if(key.equals(hn.key)){}

    Please correct me if I am wrong

    Thanks

    ReplyDelete
  10. nice explanation

    ReplyDelete
  11. your above mentioned linkedList link is not working.. kindly check and let us know the exact URL for your linkedList post

    ReplyDelete
  12. I have changed the URL. Pls check-
    http://prasanta-paul.blogspot.in/2010/06/linkedlist-in-java.html

    BR,
    Prasanta

    ReplyDelete