JumpBackHash: Say Goodbye to the Modulo Operation to Distribute Keys Uniformly to Buckets

Otmar Ertl

2024-07-03

Abstract:The distribution of keys to a given number of buckets is a fundamental task in distributed data processing and storage. A simple, fast, and therefore popular approach is to map the hash values of keys to buckets based on the remainder after dividing by the number of buckets. Unfortunately, these mappings are not stable when the number of buckets changes, which can lead to severe spikes in system resource utilization, such as network or database requests. Consistent hash algorithms can minimize remappings, but are either significantly slower than the modulo-based approach, require floating-point arithmetic, or are based on a family of hash functions rarely available in standard libraries. This paper introduces JumpBackHash, which uses only integer arithmetic and a standard pseudorandom generator. Due to its speed and simple implementation, it can safely replace the modulo-based approach to improve assignment and system stability. A production-ready Java implementation of JumpBackHash has been released as part of the Hash4j open source library.

Data Structures and Algorithms,Databases,Distributed, Parallel, and Cluster Computing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in distributed data processing and storage, how to evenly distribute keys into a given number of buckets, and when the number of buckets changes, minimize the number of keys to be re - allocated. Specifically: 1. **Background problems**: - In a distributed system, a hash function is usually used to map keys to specific buckets. - Although the traditional method based on modulo operation (\(k \mod n\)) is simple and fast, when the number of buckets changes, it will cause almost all keys to be re - allocated to different buckets, resulting in a peak in system resource utilization, such as a surge in network or database requests. 2. **Deficiencies of existing solutions**: - The consistent hashing algorithm can minimize the number of re - allocations, but these algorithms are either significantly slower than the method based on modulo operation, or require floating - point operations, or rely on uncommon hash function families in the standard library. 3. **The new method proposed in the paper**: - The paper introduces a new hash algorithm - JumpBackHash, which only uses integer operations and a standard pseudo - random generator. - The speed of JumpBackHash is comparable to that of the method based on modulo operation, but it can effectively reduce the re - allocation of keys, thereby improving the stability of distribution and the stability of the system. 4. **Specific objectives**: - Propose a hash algorithm that is both efficient and simple, which can minimize the number of re - allocations when the number of buckets changes while maintaining the even distribution of keys. - Ensure that this algorithm can be implemented with a standard pseudo - random generator and does not require floating - point operations. Through these improvements, JumpBackHash can safely replace the hash method based on modulo operation in a distributed system, thereby improving the performance and stability of the system.

JumpBackHash: Say Goodbye to the Modulo Operation to Distribute Keys Uniformly to Buckets

KVLB: an In-network Key-Value Load Balancer Using Multi-Valued Hash

Object Placement Algorithm Based on Jump Hash

Skip Hash: A Fast Ordered Map Via Software Transactional Memory

FlipHash: A Constant-Time Consistent Range-Hashing Algorithm

MementoHash: A Stateful, Minimal Memory, Best Performing Consistent Hash Algorithm

DHash: Enabling Dynamic and Efficient Hash Tables

Revisiting Consistent Hashing with Bounded Loads

Locally Uniform Hashing

Single Hash: Use One Hash Function to Build Faster Hash Based Data Structures

Fast and Powerful Hashing using Tabulation

PHOBIC: Perfect Hashing with Optimized Bucket Sizes and Interleaved Coding

Fast Consistent Hashing in Constant Time

Regular and almost universal hashing: an efficient implementation

DHash: Dynamic Hash Tables With Non-Blocking Regular Operations

ShockHash: Near Optimal-Space Minimal Perfect Hashing Beyond Brute-Force

A New Hashing Function: Statistical Behaviour and Algorithm

Rectangular Hash Table: Bloom Filter And Bitmap Assisted Hash Table With High Speed

Learned Monotone Minimal Perfect Hashing

Balanced Hashing

Dynamic-Sized Nonblocking Hash Tables