CFCSS without Aliasing for SPARC Architecture

Wang Chao,Zhongchuan Fu,Hongsong Chen,Wei Ba,Li Bin,Chen Lin,Zexu Zhang,Wang Yuying,Cui Gang
DOI: https://doi.org/10.1109/CIT.2010.356
2010-01-01
Abstract:With the increasing popularity of COTS (commercial off the shelf) components and multi-core processor in space and aviation applications, software fault tolerance becomes attractive to overcome the primary bottleneck of their susceptibility to transient faults. CFCSS (Control Flow Checking by Software Signatures) is one of the most important pure software fault tolerance techniques in mitigating control flow errors in harsh environment. As the most prominent deficiency, aliasing is the research focus of this paper, and a novel algorithm, namely CFCSS without aliasing, is put forward. First and foremost, the cause of aliasing - the existence of branch-fan-in nodes in program control flow graph - is investigated in depth, and the minimal flow graph structure giving birth to aliasing, namely “3-2 structure”, is extracted. The typical “3-2 structure” can be extended to a broader class of flow graph, named “n-(n-1) structure” by this paper, which can not be settled by previous CFCSS algorithms. Second, basing on thorough analysis of the traditional CFCSS algorithm, a method of inserting an additional basic block in program control flow graph is proposed, and the algorithm of CFCSS without aliasing is elaborately designed. The feature of independence of the program flow graph makes this algorithm more general, and in theory any kinds of flow graph structures can be dealt with it, such as “n-(n-1) structure” and other typical flow graphs that are not covered by traditional algorithms. Third, the compilation time of the algorithm is in linear with the number of basic blocks of the program control flow graph. CFCSS without aliasing is implemented under GCC 4.2.1 for SPARC architecture, and the delay slot is supported. By fault injection campaigns carried out for representative integer-dominated benchmarks from MiBench and SPEC CINT2000, the correctness, fault detection capability, and overhead of this algorithm - - are investigated in great details.
What problem does this paper attempt to address?