dictionary - How Python dict stores key, value when collision occurs? -


this question has answer here:

how python stores dict key, values when collision occurs in hash table? whats hash algorithm used hash value here?

for "normal" python, this great writeup praveen gollakota explains well, here important bits:

  • python dictionaries implemented hash tables. hash tables consist of slots, , keys mapped slots via hashing function.
  • hash table implementations must allow hash collisions i.e. if 2 keys have same hash value, implementation of table must have strategy insert , retrieve key , value pairs unambiguously.
  • python dict uses open addressing resolve hash collisions (see dictobject.c:296-297).
  • in open addressing, hash collisions resolved probing (explained below) .
  • the hash table contiguous block of memory (like array, can o(1) lookup index).
  • each slot in hash table can store 1 , 1 entry. important.
  • each entry in table combination of 3 items - <hash, key, value>. implemented c struct (see dictobject.h:51-56).
  • when new dict initialized, starts 8 slots. (see dictobject.h:49)
  • when adding entries table, start slot, i based on hash of key. cpython uses initial i = hash(key) & mask, mask = pydictminsize - 1, that's not important. note initial slot, i, checked depends on hash of key.
  • if slot empty, entry added slot (by entry, mean, <hash|key|value>). if slot occupied!? because entry has same hash (hash collision!)
  • if slot occupied, cpython (and pypy) compares hash , key (by compare mean == comparison not is comparison) of entry in slot against key of current entry inserted (dictobject.c337,344-345). if both match, thinks entry exists, gives , moves on next entry inserted. if either hash or key don't match, starts probing.
  • probing means searches slots slot find empty slot. technically go 1 one, i+1, i+2, ... , use first available 1 (that's linear probing). reasons explained beautifully in comments (see dictobject.c:33-126), cpython uses random probing. in random probing, next slot picked in pseudo random order. entry added first empty slot. discussion, actual algorithm used pick next slot not important (see dictobject.c:33-126 algorithm probing). important slots probed until first empty slot found.
  • the same thing happens lookups, starts initial slot i (where i depends on hash of key). if hash , key both don't match entry in slot, starts probing, until finds slot match. if slots exhausted, reports fail.
  • to avoid slowing down lookups, dict resized when two-thirds full (see dictobject.h:64-65).

Comments

Popular posts from this blog

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -

SQL: Divide the sum of values in one table with the count of rows in another -