Query Optimization and Rebalancing Methods based on CMD.

Shuo Wang,ZhaoGong Zhang,Qingyu Meng
DOI: https://doi.org/10.1145/3474944.3474949
2021-01-01
Abstract:As the amount of user data increase, the computer performance and I/O speed required for data processing and analysis are getting higher and higher. Distributed file system has become the primary option for big data storage and query. According to the characteristics of high dimensionality and sparseness of data, this paper uses the distributed storage idea of CMD (coordinate modulo distribution) to store data in blocks. We only need to use cheap storage devices to form a distributed storage system, which solves the problem of big data disk I/O read performance to a certain extent. We have improved range query function under the CMD storage method; at the same time, the optimized B+ tree index technology has been used to solve the precise search problem of sparse data. Finally, in view of the unbalanced distribution of different sub-node data, we propose a new data rebalancing method on the CMD storage method.
What problem does this paper attempt to address?