Chapter 5: Hash Tables
input Key; Output Value
Python has hash tables; they’re called dictionaries.
DNS resolution :mapping a web address to an IP address
Using hash tables as a cache
You get the web page a lot faster
do less work
Caching is a common way to make things faster. All big websites use caching. And that data is cached in a hash
Ex:
cache ={} ​ def get_page(url): if cache.get(url): return cache[url] else: data = get_data_from_server(url) cache[url] = data return data
This is called a collision: two keys have been assigned the same slot.
Your hash function is really important. Your hash function mapped all the keys to a single slot. Ideally, your hash function would map keys evenly all over the hash.
If those linked lists get long, it slows down your hash table a lot. But they won’t get long if you use a good hash function!
To avoid collisions
Load factor
Calculation: Load factor = number of items in hash table / total number of slots
Load factor measures how many empty slots remain in your hash table.
Once the load factor starts to grow, you need to add more slots to your hash table. This is called resizing.
With a lower load factor, you’ll have fewer collisions, and your table will perform better.
A good hash function
what is a good hash function? look up the SHA function
To recap, hashes are good for
Modeling relationships from one thing to another thing
Filtering out duplicates
Caching/memorizing data instead of making your server do work
Last updated
Was this helpful?