A Case Study for Fault Tolerance Oriented Programming in Multi-core Architecture

Lu Yang,Zhanqi Cui,Xuandong Li
DOI: https://doi.org/10.1145/1370082.1370094
2008-01-01
Abstract:The multi-core architecture brings more and more challenges and means to common software developers. Reliable software system design approaches can give a high confidence that long-running online software systems run correctly. But anyway these approaches will certainly cause the loss of the efficiency. We found that the multi-core architecture is a quite suitable platform to support reliable software system design and can make the cost acceptable because of its advantages of the parallel performance and prevalence. In this paper we make use of the multi-core architecture to support software fault tolerance. This approach will make the integration of software fault tolerance and the multi-core architecture as a common design choice. According to the idea of software fault tolerance, for some key software units in a system we can develop N separate versions of them with equivalent functionalities. Each version is developed independently by an isolated group to prevent identical faults among versions. All implemented versions run separately from same initial conditions and inputs. Outputs of all redundant versions are submitted to a decision module that determines a single result from multiple results as the correct output. In this paper, we give a case study to show that with the multi-core architecture, the redundant versions of a key software unit can run in parallel on different cores to improve the efficiency.
What problem does this paper attempt to address?