Hash Table
- Uses a
hash function
to calculate the index (hash code) based on thekey
name - The values are stored within a large array, known as
array of buckets
orslots
- Ideally, the hash function assigns each
key
to a uniquebucket
(within the array of buckets). This is where thekey-value
pair is stored - During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored (within the array of buckets)
- Hash tables are
unordered
- It's a
key-value pair
, the key can also be an index - The association between a key and a value is often known as a
mapping
- The key has to be an
immutable type
(e.g., strings and numbers). You can't use a list as key (since it can be modified in place)
Load factor
The proportion between filled slots
in the array of buckets and the total available slots
$$load\ factor\ (\alpha) = \frac{n}{m}$$
where
- $n$ is the number of entries occupied in the hash table.
- $m$ is the number of buckets
When the load factor is reaching 1, it's necessary to resize the array of buckets and rehash everything
Collisions
- Since hash functions are
surjective functions
(sobrejetora), different keys can output the same hash, leading to the so calledcollision
- Separate chaining
- It's a collision resolution method
- Whenever a collision happens, the data in saved on top of the previous data, not overwriting it, but referencing the new data like a
linked list
- When there is a collision, the access to the element is then $O(n/k)$ instead of $O(1)$
- In order to access (lookup) an element which has suffered collision, each item in the linked list is search until the corresponding key is found
Tradeoffs
- Hash tables have a good
time complexity
($O(1)$) for all operation (insert, remove, lookup) when the memory (size of the array of buckets) is infinite and therefore there are no collisions - When a smaller array of buckets is used, there is a good
space complexity
at the cost more collisions, this way degrading the lookups $O(n)$
Ordered Hash Table
- Some hash table implementation do preserve the order of insertion so that it is possible to traverse it in the same order of insertion
Other applications
- Associative arrays
- Database indexing
- Caches
- Sets