JumpBackHash: Say Goodbye to the Modulo Operation to Distribute Keys Uniformly to Buckets

Otmar Ertl
2024-07-03
Abstract:The distribution of keys to a given number of buckets is a fundamental task in distributed data processing and storage. A simple, fast, and therefore popular approach is to map the hash values of keys to buckets based on the remainder after dividing by the number of buckets. Unfortunately, these mappings are not stable when the number of buckets changes, which can lead to severe spikes in system resource utilization, such as network or database requests. Consistent hash algorithms can minimize remappings, but are either significantly slower than the modulo-based approach, require floating-point arithmetic, or are based on a family of hash functions rarely available in standard libraries. This paper introduces JumpBackHash, which uses only integer arithmetic and a standard pseudorandom generator. Due to its speed and simple implementation, it can safely replace the modulo-based approach to improve assignment and system stability. A production-ready Java implementation of JumpBackHash has been released as part of the Hash4j open source library.
Data Structures and Algorithms,Databases,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in distributed data processing and storage, how to evenly distribute keys into a given number of buckets, and when the number of buckets changes, minimize the number of keys to be re - allocated. Specifically: 1. **Background problems**: - In a distributed system, a hash function is usually used to map keys to specific buckets. - Although the traditional method based on modulo operation (\(k \mod n\)) is simple and fast, when the number of buckets changes, it will cause almost all keys to be re - allocated to different buckets, resulting in a peak in system resource utilization, such as a surge in network or database requests. 2. **Deficiencies of existing solutions**: - The consistent hashing algorithm can minimize the number of re - allocations, but these algorithms are either significantly slower than the method based on modulo operation, or require floating - point operations, or rely on uncommon hash function families in the standard library. 3. **The new method proposed in the paper**: - The paper introduces a new hash algorithm - JumpBackHash, which only uses integer operations and a standard pseudo - random generator. - The speed of JumpBackHash is comparable to that of the method based on modulo operation, but it can effectively reduce the re - allocation of keys, thereby improving the stability of distribution and the stability of the system. 4. **Specific objectives**: - Propose a hash algorithm that is both efficient and simple, which can minimize the number of re - allocations when the number of buckets changes while maintaining the even distribution of keys. - Ensure that this algorithm can be implemented with a standard pseudo - random generator and does not require floating - point operations. Through these improvements, JumpBackHash can safely replace the hash method based on modulo operation in a distributed system, thereby improving the performance and stability of the system.