Realizing Extreme Endurance Through Fault-aware Wear Leveling and Improved Tolerance.

Jiangwei Zhang,Chong Wang,Zhenhua Zhu,Donald Kline,Alex K. Jones,Huazhong Yang,Yu Wang
DOI: https://doi.org/10.1109/hpca56546.2023.10071093
2023-01-01
Abstract:Phase-change memory (PCM) and resistive memory (RRAM) are promising alternatives to traditional memory technologies. However, both PCM and RRAM suffer from limited write endurance. Wear-leveling (WL) techniques are essential to extend the lifetime of these memories before experiencing endurance faults. Beyond the additional usage afforded by WL, row-sparing and focused error correction can extend the lifetime further after wear faults appear. Unfortunately, the need for extended WL techniques continues to become more pressing as scaling exacerbates process variation. Similarly, scaling causes challenges such as more severe noise and crosstalk to traditional DRAM.In this paper, we propose novel fault-aware WL schemes to allocate write frequencies according to the strength of the rows and handle the imbalance of writes in columns. We use runtime detection schemes to identify weak rows and protect them prior to wear out. In particular, row-level WL, aka RETROFIT, leverages the spare rows provided for redundancy to be used strategically to guard against early cell wear out. RETROFIT is compatible with error correction schemes that guarantee to mitigate hard faults and error-correcting codes (ECC). Rather than discard retired rows, when any spare row completely replaces a retired row, we retarget the retired row to assist with column sparing. It becomes a group of Page Protecting Pointers (PPPs), which utilizes otherwise discarded error correction potential to further enhance the leveling ability of RETROFIT. To relieve column-level imbalance, we apply idle error correction bits before they are used to reduce average bit flips. The evaluation demonstrates that RETROFIT and enhanced RETROFIT with the PPPs improve lifetime by as much as 0.64× and 5.4× in the average case, respectively, over state-of-the-art row-level method while also reducing area overhead. In the worst-case scenario, these improvements further increase to 2.6× and 16.0×. Combined with the proposed column-level WL, enhanced RETROFIT realizes an overall 1.5× memory lifetime improvement over the perfectly uniform wear-leveling with equal storage overhead.
What problem does this paper attempt to address?