A Scheduling Strategy Based on Multi-Queues of Cassandra.

Haoping Li,Hui Li
DOI: https://doi.org/10.1109/bigdata.2017.8258228
2017-01-01
Abstract:In the era of big data, many tools and algorithms are designed to deal with the increasing data. Because data management on the traditional relational database causes scalability and performance problem, data management across multiple data centers has been proposed. Cassandra is a NoSQL database, which is built to store huge volumes of data and manage data across multiple data centers. Generally, Cassandra assign data to different nodes based on consistent hashing algorithm. So the performance of Cassandra is excellent when most random read and write are requested. However, when the popular data is read or written frequently and the data is distributed to different data centers. Each operation brings the communication delay that could not be ignored. In this article, we propose a scheduling strategy based on multi-queues to reduce communication delay when data are accessed across different data centers. To validate the effectiveness of this strategy, we implemented our approach on Cassandra and evaluation results showed the average response time of data access is reduced across multiple data centers.
What problem does this paper attempt to address?