Skip to content

Hash Function

A function that can be used to map data of arbitrary size to fixed-size values

  • The values returned by a hash function are called hash values, hash codes, digests, or simply hashes.

  • The values are usually used to index a fixed-size table called a hash table.

  • Use of a hash function to index a hash table is called hashing or scatter storage addressing.

Consistent Hashing

Consistent Hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table.

  • It may be necessary or desirable to split a hash table into several parts, hosted by different servers.

  • This is needed to achieve horizontal scaling

To determine how to split the hash table, we can take hash modulo of the number of servers:

server = hash(key) % N

  • N is the size of the pool

Problems with this method:

  • Difficult to increase or decrease the pool size

  • Almost all the keys need to be remapped if a new server is added or removed

  • Hash space

  • Hash ring

Problems with Consistent Hashing:

  • Uneven distribution

    • Virtual nodes are used to overcome this issue
  • Amazon DynamoDB

  • Apache Cassandra

Rendezvous hashing

Rendezvous or highest random weight (HRW) hashing is an algorithm that allows clients to achieve distributed agreement on a set of k options out of a possible set of n options.

  • Rendezvous hashing is both much simpler and more general than consistent hashing, which becomes a special case (for k=1) of rendezvous hashing.

Cryptographic Hash Functions

Check out Hashing