A Request Skew Aware Heterogeneous Distributed Storage System Based on Cassandra

Zhen Ye,Shanping Li
DOI: https://doi.org/10.1109/caman.2011.5778745
2011-01-01
Abstract:many distributed storage systems have been proposed to provide high scalability and high availability for modern web applications. However, most of those applications only aware data skew while actually request skew is also widely exist and needed to be considered as well. In this paper, we present a request skew aware heterogeneous distributed storage system based on Cassandra-a famous NoSQL database aiming to manage very large scale data without single point of failure. We improve Cassandra through two ways: 1) minimize forward request load by shifting the node where the client application connect to the one which can handle maximum number of skewed request dynamically; 2) when balancing data load among all nodes within the cluster, take their storage capacity into consideration. The results of our experiment present that we can reduce about 25% forward read request and 15% forward write request through approach 1) and balance storage utilization of each node obviously after applying 2).
What problem does this paper attempt to address?