Abstract:Background: Explainable artificial intelligence (XAI) is a technology that can enhance trust in mental state classifications by providing explanations for the reasoning behind artificial intelligence (AI) models outputs, especially for high-dimensional and highly-correlated brain signals. Feature importance and counterfactual explanations are two common approaches to generate these explanations, but both have drawbacks. While feature importance methods, such as shapley additive explanations (SHAP), can be computationally expensive and sensitive to feature correlation, counterfactual explanations only explain a single outcome instead of the entire model. Methods: To overcome these limitations, we propose a new procedure for computing global feature importance that involves aggregating local counterfactual explanations. This approach is specifically tailored to fMRI signals and is based on the hypothesis that instances close to the decision boundary and their counterfactuals mainly differ in the features identified as most important for the downstream classification task. We refer to this proposed feature importance measure as Boundary Crossing Solo Ratio (BoCSoR), since it quantifies the frequency with which a change in each feature in isolation leads to a change in classification outcome, i.e., the crossing of the model's decision boundary. Results and conclusions: Experimental results on synthetic data and real publicly available fMRI data from the Human Connect project show that the proposed BoCSoR measure is more robust to feature correlation and less computationally expensive than state-of-the-art methods. Additionally, it is equally effective in providing an explanation for the behavior of any AI model for brain signals. These properties are crucial for medical decision support systems, where many different features are often extracted from the same physiological measures and a gold standard is absent. Consequently, computing feature importance may become computationally expensive, and there may be a high probability of mutual correlation among features, leading to unreliable results from state-of-the-art XAI methods.

Analyzing feature importance with neural-network-derived trees

Which Neural Network Makes More Explainable Decisions? an Approach Towards Measuring Explainability

Understanding Neural Networks through Representation Erasure.

From unbiased MDI Feature Importance to Explainable AI for Trees

Reasoning with trees: interpreting CNNs using hierarchies

Sparse oblique decision trees: a tool to understand and manipulate neural net features

Importance measures derived from random forests: characterisation and extension

Optimizing for Interpretability in Deep Neural Networks with Tree Regularization

Better Model Selection with a new Definition of Feature Importance

Fair Feature Importance Scores for Interpreting Tree-Based Methods and Surrogates

An Interpretability Algorithm of Neural Network Based on Neural Support Decision Tree

Importance estimate of features via analysis of their weight and gradient profile

From local explanations to global understanding with explainable AI for trees

Interpreting Deep Neural Networks Through Variable Importance

Unboxing Tree Ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built tree

How good Neural Networks interpretation methods really are? A quantitative benchmark

A review and benchmark of feature importance methods for neural networks

Accurate and Intuitive Contextual Explanations using Linear Model Trees

Explaining the Predictions of Any Image Classifier via Decision Trees

Learning on Model Weights using Tree Experts

From local counterfactuals to global feature importance: efficient, robust, and model-agnostic explanations for brain connectivity networks