LazySort: A customized sorting algorithm for non-volatile memory

Yang Liu,Yang Ou,Wenhan Chen,Zhiguang Chen,Nong Xiao
DOI: https://doi.org/10.1016/j.ins.2023.119137
IF: 8.1
2023-05-13
Information Sciences
Abstract:Due to the development of new non-volatile storage technologies, bridging the access delay gap between storage and memory has been promoted. Therefore, since the volume of data continues to increase, the need for new types of non-volatile storage for high-performance computing and database management systems significantly increases. As a basic algorithm of the database management system, external sorting is widely used but suffers from numerous I/O operations caused by the frequent data exchange between the memory and external memory. Non-volatile memory (NVM) is a promising solution. However, the traditional external sorting algorithm is only designed for disk storage and cannot be used for NVM. To solve this issue, we propose LazySort, which is a novel external sorting algorithm that exploits the NVM byte addressing mechanism and the orderly distribution of data to improve the sorting efficiency. Specifically, LazySort first detects the ordered data segment in the memory and then records the storage location of the data segment as well as its maximum/minimum values in an index table named ITable . Next, to reduce memory usage and accelerate the sorting procedure, LazySort performs an optimization strategy RunMerge to merge non-intersecting data blocks according to the range of ITable records. To verify the performance of LazySort, we built an experimental platform with the NVM-DRAM storage architecture combining NVM and dynamic random access memory (DRAM) and conducted a series of experiments. The experimental results show that LazySort is more efficient than the traditional external sorting algorithm, since the sorting time of LazySort is shortened by 93.08% and the number of NVM write operations is reduced by 49.50%. Moreover, the advantage of LazySort over traditional algorithms is more significant when the data amount increases.
computer science, information systems
What problem does this paper attempt to address?