Epipe: A low-cost fault-tolerance technique considering WCET constraints
Jianli Li,Jingling Xue,Xinwei Xie,Qing Wan,Qingping Tan,Lanfang Tan
DOI: https://doi.org/10.1016/j.sysarc.2013.06.003
IF: 5.836
2013-01-01
Journal of Systems Architecture
Abstract:Transient faults will soon become a critical reliability concern for processors used in mainstream computing. As the mainstream commodity market accepts only low-cost solutions for transient-fault tolerance, traditional high-end solutions are not acceptable due to their prohibitive costs. This paper presents Epipe, a hybrid software/hardware solution that provides sufficient fault coverage with affordable overhead for mainstream commodity systems. Given a program, Epipe identifies its vulnerable instructions (VIs), i.e., the ones that may cause silent data corruptions (SDCs) by compile-time analysis, and selects a subset of VIs to protect considering worst-case execution time (WCET) constraints in the fault-free execution. During program execution on a modified superscalar processor which incurs minimal hardware overhead, Epipe relies on selective instruction replication to handle the VI-induced SDCs and an existing exception detector to tolerate the remaining faults that manifest as system exceptions. Our experimental results show that Epipe provides sufficient fault coverage under some tight WCET constraints and increasingly higher coverage under more relaxed WCET constraints. As the WCET allowance increases from 5% to 15% and then to 25%, the coverage increases from 70.8% to 80% and then to 86.6% averagely. Unlike existing hybrid solutions, Epipe is the first to respect WCET constraints, which are an important concern for real-time systems.