Skip to content

New hash table#1186

Merged
zuiderkwast merged 3 commits intounstablefrom
hashset
Dec 10, 2024
Merged

New hash table#1186
zuiderkwast merged 3 commits intounstablefrom
hashset

Conversation

@zuiderkwast
Copy link
Copy Markdown
Contributor

@zuiderkwast zuiderkwast commented Oct 17, 2024

This PR implements a new hash table and uses it for keys, command lookup and more. There are multiple commits. Do not squash-merge.

  1. Hash table implementation and unit tests
  2. Use new hash table for command lookup
  3. Use new hash table keys and expire

Hash table design

The hash table is a cache line optimized and implemented as outlined in #169, but changed to chaining instead of probing after an idea by Madelyn. The key-value entry is user-defined, which allows the user to embed key and value within a single allocation. The hash table supports incremental rehashing, scan, random key, etc. just like the dict but faster and using less memory. Each bucket contains a few bits of metadata per entry. For details, see the comments in src/hashtable.{c,h}.

If a bucket is full, the last entry pointer in a bucket can be replaced by a child-bucket pointer and we get a bucket chain.

      Bucket          Bucket          Bucket               
-----+---------------+---------------+---------------+-----
 ... | x x x x x x p | x x x x x x x | x x x x x x x | ... 
-----+-------------|-+---------------+---------------+-----
                   |                                       
                   v  Child bucket                         
                 +---------------+                         
                 | x x x x x x p |                         
                 +-------------|-+                         
                               |                           
                               v  Child bucket             
                             +---------------+             
                             | x x x x x x x |             
                             +---------------+             

Command lookup

The 2nd commit relaces dict with hashtable for command lookup. This was implemented by @SoftlyRaining.

Keys and expire

The 3rd commit replaces dict with the new hash table in kvstore.c and all code that uses it, such as db.c.

The hashtable entry in this case is the robj struct. The key and optionally an expire timestamp are embedded in the robj struct, i.e. the key is embedded in the value. Therefore, we can call this a valkey object, val + key. This design saves roughly 20 bytes per key for short string keys.

 keys                                      expire
hashtable             robj                hashtable
 +---+         +-------------------+        +---+
 | 1 --------->| type, enc, lru    |<-------- 1 |
 | 2 |         | refcount, flags   |        | 2 |
 | 3 |         | ptr               |        | 3 |
 | . |         | [expire]          |        | . |
 | . |         | embedded key      |        | . |
 | . |         | [embedded value]  |        | . |
 +---+         +-------------------+        +---+

[.] = optional

Some db.c functions like dbAdd, setKey and setExpire now reallocate the value object to embed the key and optional expire in it. setKey does not increment the reference counter, since it would require duplicating the object.

Fixes #991
Fixes #992

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes This issue should get a line item in the release notes run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Embed key and TTL in robj Implement new hash table