Optimizing Data Partition For Nosql Cluster

Xiangdong Huang,Jianmin Wang,Yu Zhong,Philip S. Yu
DOI: https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP.2015.182
2015-01-01
Abstract:The data partition balance impacts the performance of NoSQL systems significantly. Most of the P2P NoSQL systems use consistent hashing to partition data automatically. Currently, these systems use random virtual nodes or manual configuration to divide the consistent hashing ring, which may cause load imbalance and degrade the performance. The problem is pronounced especially for heterogeneous clusters. In this paper, we focus on the partition strategy of consistent hashing ring and propose a data partition quantified criterion. When initializing a cluster, we convert the problem to an optimization problem to find the most even partitioning result. Experiments on Cassandra and Voldemort show these methods are better than current implementations. Besides, the algorithms are very efficient even for heterogeneous clusters.
What problem does this paper attempt to address?