Hash Table (Hash Map)

A hash table (hash map) is a data structure which implements an associative array abstract data type, a structure that can map keys to values

At high level Hash Table is a key-value lookup
- Key: unique integer that is used for indexing the values
- Value: data that are associated with keys
A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found

Hash Collisions

When the hash function generates the same index for multiple keys, there will be a conflict (what value to be stored in that index).

Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ an imperfect hash function, which might cause hash collisions where the hash function generates the same index for more than one key
Such collisions must be accommodated in some way
Hash table size directly affects on the number of collisions
The bigger the hash table size the less collisions you'll get

Hash collision resolved by:

Separate Chaining:
- Make each cell of hash table point to a linked list of records that have same hash function value
- Chaining is simple, but requires additional memory outside the table
Open Addressing: In open addressing, all elements are stored in the hash table itself. Each table entry contains either a record or NIL.
- When searching for an element, we examine the table slots one by one until the desired element is found or it is clear that the element is not in the table

Load Factor = items in table / size of table

At certain Load Factor (approx. 0.8) the average cache misses per lookup becomes exponentially large when using open addressing
For lower Load Factor open addressing is better than separate chaining

Different techniques used in open addressing are:

Linear Probing: collision is resolved by checking the next slot
- The problem with linear probing is that a cluster of adjacent slots is filled. When inserting a new element, the entire cluster must be traversed
- This adds to the time required to perform operations on the hash table
- h(k, i) = (h'(k) + i) mod m
- i = {0, 1, ...}
- h'(k): is a new hash function
Quadratic Probing: the spacing between the slots is increased (greater than one) by using the following relation
- h(k, i) = (h′(k) + c1i + c2i2) mod m
- c1 and c2 are positive auxiliary constants,
- i = {0, 1, ...}
Double hashing: If a collision occurs after applying a hash function h(k), then another hash function is calculated for finding the next slot
- h(k, i) = (h1(k) + ih2(k)) mod m

HashTable: This operation is used in order to create a new hash table
Delete: This operation is used in order to delete a particular key-value pair from the hash table
Get: This operation is used in order to search a key inside the hash table and return the value that is associated with that key
Put: This operation is used in order to insert a new key-value pair inside the hash table
DeleteHashTable: This operation is used in order to delete the hash table

Insert, Find and Delete: