Analysing software failure using runtime verification and LTL

Zahra Yazdanparast
2024-02-15
Abstract:A self-healing software system is an advanced computer program or system designed to detect, diagnose, and automatically recover from faults or errors without human intervention. These systems are typically employed in mission-critical applications where downtime can have significant financial or operational consequences. Failure detection is one of the important steps in the self-healing system. In this research, a method using runtime verification is proposed to diagnose four types of errors at the component level. The simulation on mRUBIS shows that the suggested method has the necessary efficiency in detecting the occurrence of failures.
Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to use runtime verification and Linear Temporal Logic (LTL) to analyze and diagnose software failures, especially in self - healing systems. Specifically, the focus of the research is to develop a method to detect four types of errors at the component level and ensure that these errors can be efficiently identified. ### Research Background and Problem Description Self - healing software systems are advanced computer programs or systems that can automatically detect, diagnose, and recover from failures or errors without human intervention. Such systems are usually applied in mission - critical environments because downtime can bring serious financial or operational consequences. Therefore, fault detection is an important step in self - healing systems. ### Research Objectives 1. **Propose a new method**: Use runtime verification techniques to express and detect four types of component - level faults through Linear Temporal Logic (LTL). 2. **Improve fault detection efficiency**: Ensure that the proposed fault detection method is efficient enough to quickly respond to and handle faults in practical applications. 3. **Verify the effectiveness of the method**: Verify the feasibility and effectiveness of the proposed method through experiments on the mRUBIS simulator. ### Four Fault Types The following four fault types are defined in the paper: - **CF1**: The component is in an unknown state. \[ \varphi_1 = G (isUnknown) \] - **CF2**: The number of abnormal failures of the component exceeds the preset threshold. \[ \varphi_2 = G (isStarted \land lowException) \] - **CF3**: The component is removed from the architecture. \[ \varphi_3 = G (isStartedComponent1 \land isStartedComponent2 \land connector) \] - **CF4**: The connection between two components is broken. ### Method Overview 1. **Monitoring phase**: Collect event information and send it to the analysis module. 2. **Analysis phase**: Analyze the events using LTL formulas to determine the fault type. 3. **Generate Finite State Machine (FSM)**: Convert the LTL formulas into the corresponding Finite State Machine (FSM), and detect the occurrence and type of faults through runtime code. 4. **Counter mechanism**: Set a counter for each component to record the number of its failures. If the number of failures exceeds the threshold, further analysis is required, and it may be a fault caused by a dependent component. ### Experimental Verification To evaluate the proposed method, the researchers injected four fault types into the mRUBIS simulator and processed them through the self - healing cycle. The experimental results show that this method can effectively detect and handle these faults, thereby improving the reliability and operating efficiency of the system. ### Conclusion This research has successfully achieved efficient detection and diagnosis of software faults by introducing runtime verification and LTL expressions. This method not only improves the performance of self - healing systems but also provides an important reference for future adaptive and self - healing systems. ### Formula Summary - **CF1**: \(\varphi_1 = G (isUnknown)\) - **CF2**: \(\varphi_2 = G (isStarted \land lowException)\) - **CF3**: \(\varphi_3 = G (isStartedComponent1 \land isStartedComponent2 \land connector)\) Through the above methods, the researchers provide an effective means to improve the fault detection ability of self - healing systems, thereby reducing downtime and enhancing system reliability.