Hashtable is an implementation of a key-value pair data structure in
java. You can store and retrieve a ‘value’ using a ‘key’ and it is an identifier of the value stored. It is obvious that the ‘key’ should be unique.
java.util.Hashtable extends Dictionary and implements Map. Objects with non-null value
can be used as a key or value. Key of the Hashtable must implement
hashcode() and equals() methods. By the end of this article you will
find out the reason behind this condition.
Generally a Hashtable in java is created using the empty constructor Hashtable(). Which is a poor decision and an often repeated mistake. Hashtable has two other constructors
Hashtable( int initialCapacity)
|
and
Hashtable( int initialCapacity, float loadFactor)
|
. Initial capacity is number of buckets created at the time of Hashtable instantiation. Bucket is a logical space of storage for Hashtable.
Hashing and Hashtable
Before seeing java’s Hashtable in detail you should understand
hashing in general. Assume that, v is a value to be stored and k is the
key used for storage / retrieval, then h is a hash function where v is
stored at h(k) of table. To retrieve a value compute h(k) so that you
can directly get the position of v. So in a key-value pair table, you
need not sequentially scan through the keys to identify a value.
h(k) is the hashing function and it is used to find the location to
store the corresponding value v. h(k) cannot compute to a indefinite
space. Storage allocated for a Hashtable is limited within a program.
So, the hasing function h(k) should return a number within that
allocated spectrum (logical address space).
Hashing in Java
Java’s hashing uses uses hashCode() method from the key and value
objects to compute. Following is the core code from Hashtable where the
hashCode ‘h’ is computed. You can see that both key’s and value’s
hashCode() method is called.
h += e.key.hashCode() ^ e.value.hashCode();
|
It is better to have your hashCode() method in your custom objects. String has its own hashCode methode and it computes the hashcode value as below:
s[ 0 ]* 31 ^(n- 1 ) + s[ 1 ]* 31 ^(n- 2 ) + ... + s[n- 1 ]
|
If you don’t have a hashCode() method, then it is derived from Object class. Following is javadoc comment of hashCode() method from Object class:
* Returns a hash code value for the object. This method is
* supported for the benefit of hashtables such as those provided by
* <code>java.util.Hashtable</code>.
|
If you are going to write a custom hashCode(), then follow the following contract:
* The general contract of <code>hashCode</code> is:
* <ul>
* <li>Whenever it is invoked on the same object more than once during
* an execution of a Java application, the <tt>hashCode</tt> method
* must consistently return the same integer, provided no information
* used in <tt>equals</tt> comparisons on the object is modified.
|
The following is to improve performance of the Hashtable.
* <li>If two objects are equal according to the <tt>equals(Object)</tt>
* method, then calling the <code>hashCode</code> method on each of
* the two objects must produce the same integer result.
|
hashCode() guarantees distinct integers by using the internal address of the object.
Collision in Hashtable
When we try to restrict the hashing function’s output within the
allocated address spectrue limit, there is a possibility of a collision.
For two different keys k1 and k2, if we have h(k1) = h(k2), then this
is called collision in hashtable. What does this mean, our hashing
function directs us store two different values (keys are also different)
in the same location.
When we have a collision, there are multiple methodologies available
to resolve it. To name a few hashtable collision resolution technique,
‘separate chaining’, ‘open addressing’, ‘robin hood hashing’, ‘cuckoo
hashing’, etc. Java’s hashtable uses ‘separate chaining’ for collision
resolution in Hashtable.
Collision Resolution in java’s Hashtable
Java uses separate chaining for collision resolution. Recall a point
that Hashtable stores elements in buckets. In separate chaining, every
bucket will store a reference to a linked list. Now assume that you have
stored an element in bucket 1. That means, in bucket 1 you will have a
reference to a linked list and in that linked list you will have two
cells. In those two cells you will have key and its corresponding value.
Why do you want to store the key? Because when there is a collision
i.e., when two keys results in same hashcode and directs to the same
bucket (assume bucket 1) you want to store the second element also in
the same bucket. You add this second element to the already created
linked list as the adjacent element.
Now when you retrieve a value it will compute the hash code and
direct you to a bucket which has two elements. You scan those two
elements alone sequentially and compare the keys using their equals()
method. When the key mathches you get the respective value. Hope you
have got the reason behind the condition that your object must have
hashCode() and equals() method.
Java has a private static class Entry inside Hashtable. It is an
implementation of a list and you can see there, it stores both the key
and value.
Hashtable performance
To get better performance from your java Hashtable, you need to
1) use the initialCapacity and loadFactor arguments
2) use them wisely
while instantiating a Hashtable.
initialCapacitiy is the number of buckets to be created at the time
of Hashtable instantiation. The number of buckets and probability of
collision is inversly proportional. If you have more number of buckets
than needed then you have lesser possibility for a collision.
For example, if you are going to store 10 elements and if you are
going to have initialCapacity as 100 then you will have 100 buckets. You
are going to calculate hashCoe() only 10 times with a spectrum of 100
buckets. The possibility of a collision is very very less.
But if you are going to supply initialCapacity for the Hashtable as
10, then the possibility of collision is very large. loadFactor decides
when to automatically increase the size of the Hashtable. The default
size of initialCapacity is 11 and loadFactor is .75 That if the
Hashtable is 3/4 th full then the size of the Hashtable is increased.
New capacity in java Hashtable is calculated as follows:
int newCapacity = oldCapacity * 2 + 1 ;
|
If you give a lesser capacity and loadfactor and often it does the
rehash() which will cause you performance issues. Therefore for
efficient performance for Hashtable in java, give initialCapacity as 25%
extra than you need and loadFactor as 0.75 when you instantiate.
0 comments:
Post a Comment