Abstract:Causal concept effect estimation is gaining increasing interest in the field of interpretable machine learning. This general approach explains the behaviors of machine learning models by estimating the causal effect of human-understandable concepts, which represent high-level knowledge more comprehensibly than raw inputs like tokens. However, existing causal concept effect explanation methods assume complete observation of all concepts involved within the dataset, which can fail in practice due to incomplete annotations or missing concept data. We theoretically demonstrate that unobserved concepts can bias the estimation of the causal effects of observed concepts. To address this limitation, we introduce the Missingness-aware Causal Concept Explainer (MCCE), a novel framework specifically designed to estimate causal concept effects when not all concepts are observable. Our framework learns to account for residual bias resulting from missing concepts and utilizes a linear predictor to model the relationships between these concepts and the outputs of black-box machine learning models. It can offer explanations on both local and global levels. We conduct validations using a real-world dataset, demonstrating that MCCE achieves promising performance compared to state-of-the-art explanation methods in causal concept effect estimation.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problem of how to accurately assess the causal effects of certain concepts when they are not observed in the causal concept effect estimation. Specifically: 1. **Limitations of existing methods**: Existing methods for causal concept effect interpretation usually assume that all concepts involved in the dataset are fully observed. However, in practical applications, this assumption often does not hold due to incomplete labeling or missing concept data. 2. **Impact of unobserved concepts**: The author proves through theoretical analysis that unobserved concepts can cause bias in the causal effect estimation of observed concepts. This bias can lead to inaccurate interpretation of model behavior and affect the reliability and transparency of decision - making. 3. **Proposed new framework**: To solve the above problems, the author proposes a new framework - **Missingness - aware Causal Concept Explainer (MCCE)**. This framework compensates for the information loss of unobserved concepts by constructing pseudo - concepts, which are orthogonal to the observed concepts. MCCE can provide both local and global explanations and can be used as an interpretable prediction model. 4. **Verification and performance**: The author uses a real - world dataset for verification, and the results show that MCCE has achieved promising performance in causal concept effect estimation, outperforming or at least being comparable to existing interpretation methods. ### Formula summary - **Empirical Individual Conceptual Causal Effect (\ICaCE)**: \[ \ICaCE_N(x_c \to c') = N(x_c \to c') - N(x_c) \] This formula measures the impact of changing the value of a specific concept \(C\) from \(c\) to \(c'\) on the prediction result of the black - box model \(N\) on the input sample \(x\). - **ICaCE - Error**: \[ \text{ICaCE - Error}_N(E)=\frac{1}{|D|} \sum_{x_c \in D} \text{Dist}\left(\ICaCE_N(x_c, x_c \to c'), E(c, c'|x)\right) \] This formula measures the average distance between the \(\ICaCE\) estimated by the interpretation method \(E\) on different samples and the actual \(\ICaCE\). - **Linear interpreter \(E^*\)**: \[ E^* = C_{\text{complete}}^T\beta^*=N(X) \] Assume that there is a linear interpreter \(E^*\) that can perfectly explain the output of the black - box model \(N(X)\), where \(\beta^*\) is the coefficient vector. - **Construction of pseudo - concepts**: \[ C_{\text{pseud}}=(I - P)H \] where \(P = C_{\text{ob}}(C_{\text{ob}}^T C_{\text{ob}})^{-1}C_{\text{ob}}^T\) is the projection matrix, ensuring that \(C_{\text{pseud}}\) is orthogonal to \(C_{\text{ob}}\). ### Conclusion The MCCE framework effectively reduces the bias in causal effect estimation by introducing pseudo - concepts to compensate for the information loss of unobserved concepts. The experimental results show that MCCE provides more accurate and reliable causal concept effect estimation in the case of dealing with unobserved concepts and has broad application prospects.

MCCE: Missingness-aware Causal Concept Explainer

Explaining Classifiers with Causal Concept Effect (CaCE)

Debiasing Concept-based Explanations with Causal Analysis

CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior

MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation

DiConStruct: Causal Concept-based Explanations through Black-Box Distillation

Causality-aware Concept Extraction based on Knowledge-guided Prompting

MCCE: Monte Carlo sampling of valid and realistic counterfactual explanations for tabular data

Evidential Concept Embedding Models: Towards Reliable Concept Explanations for Skin Disease Diagnosis

Causal Concept Graph Models: Beyond Causal Opacity in Deep Learning

Explanatory causal effects for model agnostic explanations

CausaLM: Causal Model Explanation Through Counterfactual Language Models

A Unified Concept-Based System for Local, Global, and Misclassification Explanations

Causality-Aware Local Interpretable Model-Agnostic Explanations

Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning

On the Relationship Between Explanation and Prediction: A Causal View

ConceptExplainer: Interactive Explanation for Deep Neural Networks from a Concept Perspective

Evaluating Readability and Faithfulness of Concept-based Explanations

An inherently interpretable deep learning model for local explanations using visual concepts

Invertible Concept-based Explanations for CNN Models with Non-negative Concept Activation Vectors

Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals