Chapter 5: Hash Tables

input Key; Output Value

  • Python has hash tables; they’re called dictionaries.

  • DNS resolution :mapping a web address to an IP address

  • Using hash tables as a cache

    • You get the web page a lot faster

    • do less work

    • Caching is a common way to make things faster. All big websites use caching. And that data is cached in a hash

      • Ex:

        cache ={}
        ​
        def get_page(url):
          if cache.get(url):
            return cache[url]
          else:
            data = get_data_from_server(url)
            cache[url] = data
            return data
        • This is called a collision: two keys have been assigned the same slot.

          • Your hash function is really important. Your hash function mapped all the keys to a single slot. Ideally, your hash function would map keys evenly all over the hash.

          • If those linked lists get long, it slows down your hash table a lot. But they won’t get long if you use a good hash function!

        • To avoid collisions

          • Load factor

            • Calculation: Load factor = number of items in hash table / total number of slots

            • Load factor measures how many empty slots remain in your hash table.

            • Once the load factor starts to grow, you need to add more slots to your hash table. This is called resizing.

            • With a lower load factor, you’ll have fewer collisions, and your table will perform better.

          • A good hash function

            • what is a good hash function? look up the SHA function

        • To recap, hashes are good for

          • Modeling relationships from one thing to another thing

          • Filtering out duplicates

          • Caching/memorizing data instead of making your server do work

Last updated

Was this helpful?