Abstract:In this paper, we propose REASON, a novel framework that enables the automatic discovery of both intra-level (i.e., within-network) and inter-level (i.e., across-network) causal relationships for root cause localization. REASON consists of Topological Causal Discovery and Individual Causal Discovery. The Topological Causal Discovery component aims to model the fault propagation in order to trace back to the root causes. To achieve this, we propose novel hierarchical graph neural networks to construct interdependent causal networks by modeling both intra-level and inter-level non-linear causal relations. Based on the learned interdependent causal networks, we then leverage random walks with restarts to model the network propagation of a system fault. The Individual Causal Discovery component focuses on capturing abrupt change patterns of a single system entity. This component examines the temporal patterns of each entity's metric data (i.e., time series), and estimates its likelihood of being a root cause based on the Extreme Value theory. Combining the topological and individual causal scores, the top K system entities are identified as root causes. Extensive experiments on three real-world datasets with case studies demonstrate the effectiveness and superiority of the proposed framework.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **Root Cause Localization in complex systems**. Specifically, existing methods mainly focus on constructing a single, isolated causal network, ignoring the interdependent structures of many complex systems in reality (i.e., multiple networks are interconnected through cross - network links). Therefore, in these interdependent networks, fault effects can propagate between different levels of different networks or system entities, resulting in sub - optimal root cause analysis results. ### Core problems of the paper 1. **Limitations of existing methods**: - Existing methods mainly focus on constructing a single effective isolated causal network, ignoring the complexity and interdependence in real - world systems. - The interdependent relationships between multiple networks are ignored, leading to inaccurate root cause localization of faults. 2. **Actual requirements**: - Faults in complex systems (such as microservice systems, industrial control systems, etc.) will affect user experience and cause economic losses. Therefore, efficient and accurate root cause analysis is required to quickly restore services and reduce losses. ### Proposed solutions To solve the above problems, the paper proposes a new framework named **REASON** for automatically discovering causal relationships within and across networks and accurately locating the root cause of faults. The REASON framework includes two main components: 1. **Topological Causal Discovery (TCD)**: - It aims to model the fault propagation path to trace back to the root cause. - It uses Hierarchical Graph Neural Networks to construct interdependent causal networks and capture non - linear causal relationships within and across networks. - It utilizes the Random Walk with Restarts model for network fault propagation. 2. **Individual Causal Discovery (ICD)**: - It focuses on capturing the mutation patterns of individual system entities. - It analyzes the time - series data of each entity and estimates the probability of it being the root cause based on Extreme Value Theory. ### Integration and output Finally, REASON combines the topological causal score and the individual causal score and selects the top \( K \) system entities with the highest scores as the root causes. ### Summary The goal of the paper is to accurately identify the root cause of system faults by learning the causal relationships of multi - level interconnected systems, thereby improving the stability and robustness of complex systems.

Hierarchical Graph Neural Networks for Causal Discovery and Root Cause Localization

A Hybrid Deep Neural Network for Nonlinear Causality Analysis in Complex Industrial Control System

Towards Human-like Perception: Learning Structural Causal Model in Heterogeneous Graph

Causality Enhanced Global-Local Graph Neural Network for Bioprocess Factor Forecasting

Neural Network Weight Comparison for Industrial Causality Discovering and Its Soft Sensing Application

Local Causal Discovery with Background Knowledge

OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks

Sparse Causal Residual Neural Network for Linear and Nonlinear Concurrent Causal Inference and Root Cause Diagnosis

When Graph Neural Network Meets Causality: Opportunities, Methodologies and An Outlook

Rethinking Causal Relationships Learning in Graph Neural Networks

CUTS: Neural Causal Discovery from Unstructured Time-Series Data

Counterfactual Graph Learning for Anomaly Detection on Attributed Networks

Counterfactual-based Root Cause Analysis for Dynamical Systems

Hybrid Top-Down Global Causal Discovery with Local Search for Linear and Nonlinear Additive Noise Models

CUTS+: High-dimensional Causal Discovery from Irregular Time-series

CUTS: Neural Causal Discovery from Irregular Time-Series Data

Causal Discovery over High-Dimensional Structured Hypothesis Spaces with Causal Graph Partitioning

Graph Neural Network Causal Explanation via Neural Causal Models

CORE: Towards Scalable and Efficient Causal Discovery with Reinforcement Learning

On Root Cause Localization and Anomaly Mitigation through Causal Inference

Casual Inference-Enabled Graph Neural Networks for Generalized Fault Diagnosis in Industrial IoT System