A Fast Low-Level Error Detection Technique

Zhengyang He,Hui Xu,Guanpeng Li
DOI: https://doi.org/10.1109/dsn58291.2024.00023
2024-01-01
Abstract:As transistors continue to shrink in size, the soft error rate in computer systems is rising, posing a critical threat of severe failures. error detection by duplicating instruction (Ellin) has been proposed as a prominent software-based technique for detecting soft errors. However, utilizing EDDI specifically at the assembly level has not been well explored in the literature. Towards this end, the paper introduces FERRUM, an innovative assembly level EDDI, which is a boosted version compared with the original assembly level EDDI by using SIMD and other compiler-level transformations. We evaluate FERRUM in both fault coverage and runtime performance compared with IR level EDDI and original assembly level EDDI. The results show that FERRUM not only ensures 100% protection coverage at the assembly level but also surpasses baseline techniques by more than 50% in runtime performance overhead.
What problem does this paper attempt to address?