Optimizing Data Partition for Scaling out Nosql Cluster

Xiangdong Huang,Jianmin Wang,Yu Zhong,Shaoxu Song,Philip S. Yu
DOI: https://doi.org/10.1002/cpe.3643
2015-01-01
Abstract:SummaryData partition impacts the performance of Not Only SQL (NoSQL) systems significantly. Nowadays, many of the peer‐to‐peer NoSQL systems use consistent hashing to partition data automatically. These systems use virtual nodes and random data placement methods to divide the consistent hashing ring, which may lead to imbalanced data partition and degrade the overall system performance. The problem is prominent especially for scaling out heterogeneous clusters. Considering the capacity of each node, an imbalance coefficient of data distribution for a cluster is proposed firstly in this paper. Based on the imbalance coefficient, we propose a dynamic programming algorithm to calculate the position of the new coming node in the consistent hashing ring, which expands the consistent hashing ring more evenly without re‐shuffling the entire datasets. Simulations and experiments on Cassandra with Yahoo! Cloud Serving Benchmark (YCSB) benchmark show our algorithm is better than the state‐of‐the‐art work. Copyright © 2015 John Wiley & Sons, Ltd.
What problem does this paper attempt to address?