DTOR: Decision Tree Outlier Regressor to explain anomalies

Riccardo Crupi,Daniele Regoli,Alessandro Damiano Sabatino,Immacolata Marano,Massimiliano Brinis,Luca Albertazzi,Andrea Cirillo,Andrea Claudio Cosentini
2024-05-13
Abstract:Explaining outliers occurrence and mechanism of their occurrence can be extremely important in a variety of domains. Malfunctions, frauds, threats, in addition to being correctly identified, oftentimes need a valid explanation in order to effectively perform actionable counteracts. The ever more widespread use of sophisticated Machine Learning approach to identify anomalies make such explanations more challenging. We present the Decision Tree Outlier Regressor (DTOR), a technique for producing rule-based explanations for individual data points by estimating anomaly scores generated by an anomaly detection model. This is accomplished by first applying a Decision Tree Regressor, which computes the estimation score, and then extracting the relative path associated with the data point score. Our results demonstrate the robustness of DTOR even in datasets with a large number of features. Additionally, in contrast to other rule-based approaches, the generated rules are consistently satisfied by the points to be explained. Furthermore, our evaluation metrics indicate comparable performance to Anchors in outlier explanation tasks, with reduced execution time.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
This paper proposes a solution to the interpretability problem in anomaly detection in machine learning. In internal audits in sectors such as banking, anomaly detection techniques are used to identify unusual data points such as faults, fraud, or threats. However, the interpretability of these techniques is a challenge because explanations need to be provided to internal auditors who may not have expertise in data analysis. Existing explanation methods, such as SHAP based on feature importance, may have limited interpretability in complex models or high-dimensional datasets. The paper introduces a new approach called Decision Tree Anomaly Regressor (DTOR), which generates rule-based explanations by estimating anomaly scores generated by the anomaly detection model. DTOR utilizes a decision tree regressor to compute the estimated score and extracts the paths associated with the data point's score. This approach performs robustly on datasets with a large number of features and generates more relevant rules for the data point to be explained. Compared to other rule-based explanation methods like Anchors, DTOR has shorter execution time and comparable performance in anomaly explanation tasks. The innovation of DTOR lies in its ability to provide transparent decision logic for anomaly detection models, enabling non-data scientists to understand the reasons for anomalies occurring and thus improving risk assessment and decision-making in banking systems. In this way, DTOR enhances the efficiency and security of internal audits in banking systems.