Beyond using hash maps (every candidate can do HashMap.put()), they test whether you understand the internals: how hashing works, what happens on collisions, why O(1) is amortized not guaranteed, and when a hash map is the WRONG choice.
Hidden intent: When you reach for a hash map to solve a problem, can you justify that choice over alternatives (sorted array + binary search, trie, balanced BST)?
Mid-level: Explain chaining vs open addressing, know time complexities, use hash maps effectively in coding problems. Senior: Discuss load factor and rehashing, hash function quality, thread safety concerns (ConcurrentHashMap vs synchronized), and memory overhead compared to arrays.
A hash map stores key-value pairs using a hash function to compute an index into an array of buckets. The hash function converts a key into an integer, then modular arithmetic maps it to a bucket index. When two keys hash to the same bucket (collision), the map must resolve this — typically via chaining (linked list or tree per bucket) or open addressing (probing for the next empty slot).
Tradeoff: Chaining handles high collision rates gracefully but has pointer-chasing overhead and poor cache locality. Open addressing has better cache performance for low load factors but degrades sharply as the table fills (clustering). Java's HashMap switched from pure chaining to chaining-with-treeification in Java 8 — when a bucket exceeds 8 entries, the linked list converts to a red-black tree, changing worst-case lookup from O(n) to O(log n).
Real-world: Python's dict uses open addressing with perturbation-based probing, achieving excellent average performance but making worst-case analysis harder. Redis's hash tables use incremental rehashing — instead of blocking to resize, they maintain two tables during resizing and gradually migrate entries, keeping response times consistent. This is the answer to "how do production systems handle hash table growth?"
Failure mode: Hash maps with user-controlled keys are vulnerable to hash collision attacks (HashDoS). If an attacker crafts keys that all hash to the same bucket, O(1) degrades to O(n). This caused real CVEs in web frameworks processing POST parameters. Production mitigation: randomized hash seeds (SipHash), which is why Python randomizes hash seeds by default since 3.3.
When NOT to use: If you need ordered iteration, range queries, or memory efficiency for small datasets, a sorted array or BST is better. Hash maps have 2-3x memory overhead per entry compared to arrays due to pointers and load factor headroom.
Free assessment. No signup needed.
Start Free Assessment