Optimal Loop Tiling for Minimizing Write Operations on NVMs with Complete Memory Latency Hiding

Rui Xu,Edwin Hsing-Mean Sha,Qingfeng Zhuge,Yuhong Song,Jingzhi Lin
DOI: https://doi.org/10.1109/asp-dac52403.2022.9712532
2022-01-01
Abstract:Non-volatile memory (NVM) is expected to be the second level memory (named remote memory) in two-level memory hierarchy in the future. However, NVM has the limited write endurance, thus it is vital to reduce the number of write operations on NVM. Meanwhile, in two-level memory hierarchy, prefetch is widely used for fetching certain data before it is actually required, to hide the remote memory access latency. In general, large-scale nested loop is the performance bottleneck in one program due to the write operations on NVM caused by the first level memory (named local memory) miss and data reuse. Loop tiling is the key technique for grouping iterations so as to reduce the communication with remote memory used in compiler. In this paper, we propose a new loop tiling approach for minimizing the write operations on NVMs and completely hiding the NVM access latency. Specifically, we introduce a series of theorems to help loop tiling. Then, a legal tile shape and an optimal tile size selection strategy is proposed according to data dependency and local memory capacity. Furthermore, we propose a pipeline scheduling policy to completely hide the remote memory latency. Extensive experiments show that the proposed techniques can reduce write operations on NVMs by 95.1% on average, and NVM latency can be completely hidden.
What problem does this paper attempt to address?