Algorithmic Fault Detection for RRAM-based Matrix Operations

Mengyun Liu,Lixue Xia,Yu Wang,Krishnendu Chakrabarty
DOI: https://doi.org/10.1145/3386360
IF: 1.447
2020-01-01
ACM Transactions on Design Automation of Electronic Systems
Abstract:An RRAM-based computing system (RCS) provides an energy-efficient hardware implementation of vector-matrix multiplication for machine-learning hardware. However, it is vulnerable to faults due to the immature RRAM fabrication process. We propose an efficient fault tolerance method for RCS; the proposed method, referred to as extended-ABFT (X-ABFT), is inspired by algorithm-based fault tolerance (ABFT). We utilize row checksums and test-input vectors to extract signatures for fault detection and error correction. We present a solution to alleviate the overflow problem caused by the limited number of voltage levels for the test-input signals. Simulation results show that for a Hopfield classifier with faults in 5% of its RRAM cells, X-ABFT allows us to achieve nearly the same classification accuracy as in the fault-free case.
What problem does this paper attempt to address?