IRQ Coloring and the Subtle Art of Mitigating Interrupt-generated Interference

Diogo Costa,Luca Cuomo,Daniel Oliveira,Ida Maria Savino,Bruno Morelli,José Martins,Alessandro Biasci,Sandro Pinto
2023-08-02
Abstract:Integrating workloads with differing criticality levels presents a formidable challenge in achieving the stringent spatial and temporal isolation requirements imposed by safety-critical standards such as ISO26262. The shift towards high-performance multicore platforms has been posing increasing issues to the so-called mixed-criticality systems (MCS) due to the reciprocal interference created by consolidated subsystems vying for access to shared (microarchitectural) resources (e.g., caches, bus interconnect, memory controller). The research community has acknowledged all these challenges. Thus, several techniques, such as cache partitioning and memory throttling, have been proposed to mitigate such interference; however, these techniques have some drawbacks and limitations that impact performance, memory footprint, and availability. In this work, we look from a different perspective. Departing from the observation that safety-critical workloads are typically event- and thus interrupt-driven, we mask "colored" interrupts based on the \ac{QoS} assessment, providing fine-grain control to mitigate interference on critical workloads without entirely suspending non-critical workloads. We propose the so-called IRQ coloring technique. We implement and evaluate the IRQ Coloring on a reference high-performance multicore platform, i.e., Xilinx ZCU102. Results demonstrate negligible performance overhead, i.e., <1% for a 100 microseconds period, and reasonable throughput guarantees for medium-critical workloads. We argue that the IRQ coloring technique presents predictability and intermediate guarantees advantages compared to state-of-art mechanisms
Distributed, Parallel, and Cluster Computing,Performance,Systems and Control
What problem does this paper attempt to address?
The paper primarily addresses the issue of interference caused by interrupt handling in Mixed-Criticality Systems (MCS). ### Research Background and Challenges - **Mixed-Criticality Systems**: These systems integrate workloads of different criticality levels (e.g., safety-critical and non-safety-critical). They need to meet strict spatial and temporal isolation requirements during design to comply with safety standards such as ISO26262. - **Challenges of Multi-core Platforms**: The application of high-performance multi-core platforms has intensified the competition for shared resources (such as caches, bus interconnects, memory controllers, etc.) among different subsystems, leading to interference issues. This poses challenges for the certification of mixed-criticality systems. - **Interference Caused by Interrupts**: Safety-critical workloads are often event-driven, so interrupt handling continuously alters the main program flow, leading to increased cache misses and concurrent access to main memory. This exacerbates the use of shared resources and can result in unpredictability and delays. ### Main Contributions of the Paper The paper proposes a technique called **IRQ Coloring**, which aims to mitigate the impact of interrupt-induced interference on critical workloads by selectively masking or delaying certain interrupts based on Quality of Service (QoS) assessment, without completely pausing non-critical workloads. - **Technical Principle**: By classifying interrupts according to their criticality levels and dynamically enabling or disabling these interrupts based on the QoS of critical workloads, effective control over interference can be achieved. - **Implementation and Evaluation**: This technique has been implemented and evaluated on the Xilinx ZCU102 platform. The results show that its performance overhead is minimal (less than 1%) and it can provide reasonable throughput guarantees for medium-criticality tasks. - **Experimental Evidence**: The paper provides specific evidence of interference caused by interrupt handling through experiments conducted on the MiBench automotive benchmark suite. - **System Architecture**: It includes a design-time tool (DTT) and a runtime mechanism (RTM) for IRQ Coloring. DTT is used to configure interrupt masking strategies, while RTM is responsible for collecting hardware performance counter data at runtime and selectively disabling interrupts based on this data. - **Case Study**: The paper also presents a specific "toy" example that details how IRQ Coloring progressively disables and restores interrupts at different points in time and how this process affects the Quality of Service (QoS) of various virtual machines (VMs). In summary, the paper aims to address the interference issue caused by interrupt handling in mixed-criticality systems through the IRQ Coloring technique and demonstrates the effectiveness of this method through empirical analysis.