DAMOCRO: A Data Migration Framework Using Online Classification and Reordering

Zhongxin Hu,Kaiyu Li,Xingjian Mao,Jingfeng Pan,Yunfei Peng,Aijun An,Xiaohui Yu,Dariusz Jania
DOI: https://doi.org/10.1145/3627673.3680097
2024-01-01
Abstract:This paper introduces DAMOCRO, a data migration framework using online classification and tuple reordering to improve throughput and decrease the costs of data migration. The DAMOCRO workflow consists of four main steps. First, it classifies records into subgroups to maximize the similarity within each group. Next, it reorders tuples within these groups, ensuring that similar tuples are adjacent. Subsequently, column-wise compression is applied to each group. Finally, the compressed data is transferred from the source to the target machine. The initial two steps enhance the compression ratio, thereby boosting throughput and reducing costs. Our evaluations on five real-world datasets and two benchmark datasets, show that the online classification process in DAMOCRO improves throughput by more than 24% and reduces costs by over 19% compared to baselines. Besides, implementing reordering based on functional dependencies brings an additional cost reduction ranging from 10% to 60%, while also enhancing throughput.
What problem does this paper attempt to address?