SDCInfer: Inference of Silent Data Corruption Causing Instructions

Junchi Ma,Yun Wang,Ling Zhou,Cheng Hu,Hui Wang
DOI: https://doi.org/10.1109/icsess.2015.7339043
2015-01-01
Abstract:As process technology scales, electronic devices become more susceptible to transient faults induced by radiation. Symptom-based detection techniques provide promising low-cost and effective solutions, but could hardly catch faults that produce silent data corruptions (SDCs). Identifying and understanding instructions that cause SDCs is crucial to the development of program-level detectors. This paper introduces SDCInfer, an approach that characterizes the propagation of faults resulting in SDCs and consequently determines potential SDC causing instructions. By tracking down instruction traces of faults that lead to SDCs, SDCInfer employs a few heuristics to determine whether a particular instruction could impact the outcome of a program in the presence of fault. We demonstrate the use of SDCInfer on Siemens benchmark, which shows that the coverage of SDC causing instructions increases by 145%, when compared with the original result provided by fault injection. Our validation efforts show that SDCInfer determines SDC causing instructions with around 92% accuracy, averaged across all the applications studied here.
What problem does this paper attempt to address?