Tetris Write: Exploring More Write Parallelism Considering PCM Asymmetries
Zheng Li,Fang Wang,Dan Feng,Yu Hua,Wei Tong,Jingning Liu,Xiang Liu
DOI: https://doi.org/10.1109/ICPP.2016.25
2016-01-01
Abstract:The noises at the power lines limit the charge pump to provide large instantaneous current to PCM cells, which results in the number of bits can be written concurrently, i.e. the size of write unit, is restricted in PCM. When implementing PCM as the main memory, the inequality of cache line's size and write unit's size may result in many consecutive executed write units, which greatly decreases the system performance. Existing PCM write schemes, however, consider the worst power and time cases of written data, and ignore the actual current consumption. It is assumed that all data bits are changed and the electric current of each data unit is under fully utilized. The write performance is blocked due to pessimistic estimates, i.e. the current is often excessively supplied but is not used effectively, which leads to huge energy consumption. As a result, the write parallelism is limited and therefore restricts the overall system performance. To address this problem, this paper proposes a novel PCM write scheme named Tetris Write to explore more write parallelism and reduce the critical number of write units in PCM chip. The key idea behind Tetris Write is to monitor the number of '1' and '0' changed in each data unit, and schedule the order of data units' write-1 and write-0 execution considering not only the time and power asymmetries, but also the number asymmetry between RET and SET operations, to allow a larger number of concurrent bit-writes and make the best use of power supply. Tetris Write tries to schedule the dominating long term write-1s first and attempts to steal interspaces remained by write-1s to put the extraessential short write-0s. 4-core PARSEC benchmarks' results show that Tetris Write can get 65% read latency reduction, 40% write latency reduction, 46% running time reduction and 2X IPC improvement compared with the baseline on average. In addition, Tetris Write earns 26%, 15% and 10% more read latency reduction, 15%, 7% and 5% more write latency reduction, and outperforms 22%, 12% and 7% more running time reduction, compared with the state-of-the-art Flip-N-Write, 2-Stage-Write and Three-Stage-Write schemes, whose IPC improvements are 1.4X, 1.6X and 1.8X, respectively.