Heterogeneous Replicas for Multi-dimensional Data Management

Jialin Qiao,Yuyuan Kang,Xiangdong Huang,Lei Rui,Tian Jiang,Jianmin Wang,Philip S. Yu
DOI: https://doi.org/10.1007/978-3-030-59410-7_2
2020-01-01
Abstract:Multi-dimensional data is widely used in different scenarios, such as cluster monitoring and user behavior analysis for web services. The data is usually managed by distributed databases with a replication strategy, which enhances the availability, fault-tolerance, and I/O throughput. Normally, these replicas share the same physical layout on the disk, which is designed by database administrators according to the target workload. However, it is critical to derive an optimal layout that benefits as many queries as possible, because a layout that accommodates only some queries can negatively impact the others. To tackle this limitation, we propose heterogeneous replicas for multi-dimensional data that provide a higher query throughput without additional disk occupation and without slowing down the writing speed, while still ensuring high availability and load balance. The proposed replication method allows different replicas to be logically identical while having different physical data layouts on the disk. We verified the efficiency of our method in a NoSQL system, Cassandra, with the TPC-H dataset and with a synthetically generated dataset. The results show that our method outperforms state-of-the-art solutions.
What problem does this paper attempt to address?