Software Fault Tolerance in Real-Time Systems: Identifying the Future Research Questions

Federico Reghenzani,Zhishan Guo,William Fornaciari
DOI: https://doi.org/10.1145/3589950
IF: 16.6
2023-03-30
ACM Computing Surveys
Abstract:Tolerating hardware faults in modern architectures is becoming a prominent problem due to the miniaturization of the hardware components, their increasing complexity, and the necessity to reduce the costs. Software-Implemented Hardware Fault Tolerance approaches have been developed to improve the system dependability to hardware faults without resorting to custom hardware solutions. However, these come at the expense of making the satisfaction of the timing constraints of the applications/activities harder from a scheduling standpoint. This paper surveys the current state of the art of fault tolerance approaches when used in the context real-time systems, identifying the main challenges and the cross-links between these two topics. We propose a joint scheduling-failure analysis model that highlights the formal interactions among software fault tolerance mechanisms and timing properties. This model allows us to present and discuss many open research questions with the final aim to spur the future research activities.
computer science, theory & methods
What problem does this paper attempt to address?