Characterizing the L1 Data Cache's Vulnerability to Transient Errors in Chip-Multiprocessors

Li Tang,Shuai Wang,Jie Hu,Xiaobo Sharon Hu
DOI: https://doi.org/10.1109/ISVLSI.2011.23
2011-01-01
Abstract:With continuous technology scaling, current and next generation microprocessors are becoming more vulnerable to transient errors such as soft errors induced by energetic particle strikes. While mainstream microprocessors are employing multi-/many-core architectures targeting at high-performance parallel computing applications, the transistor/area share of on-chip caches keeps increasing. As cache memories being the major victim of soft errors, it is of paramount importance to characterize on-chip cache's vulnerability in this context for devising potential reliability optimizations, especially under the interaction with cache coherence protocols. In this work, we develop a lifetime model for the private L1 data cache in chip-multiprocessors (CMPs), which is based on the cache activities and the states of cache lines. This lifetime model is then applied to characterize and predict cache's vulnerability trend in CMPs. Our experimental evaluation shows that cache vulnerable phases due to remote accesses increase dramatically as the number of processor cores increases. Based on vulnerable phase analysis, we propose a protocol enhancement to prematurely invalidate cache lines in modified (M) state for minimizing the vulnerability factor due to remote reads to modified cachelines.
What problem does this paper attempt to address?