Abstract:The feature attribution-based explanation (FAE) methods, which indicate how much each input feature contributes to the model’s output for a given data point, are one of the most popular categories of explainable machine learning techniques. Although various metrics have been proposed to evaluate the explanation quality, no single metric could capture different aspects of the explanations. Different conclusions might be drawn using different metrics. Moreover, during the processes of generating explanations, existing FAE methods either do not consider any evaluation metric or only consider the faithfulness of the explanation, failing to consider multiple metrics simultaneously. To address this issue, we formulate the problem of creating FAE explainable models as a multi-objective learning problem that considers multiple explanation quality metrics simultaneously. We first reveal conflicts between various explanation quality metrics, including faithfulness, sensitivity, and complexity. Then, we define the considered multi-objective explanation problem and propose a multi-objective feature attribution explanation (MOFAE) framework to address this newly defined problem. Subsequently, we instantiate the framework by simultaneously considering the explanation’s faithfulness, sensitivity, and complexity. Experimental results comparing with six state-of-the-art FAE methods on eight datasets demonstrate that our method can optimize multiple conflicting metrics simultaneously and can provide explanations with higher faithfulness, lower sensitivity, and lower complexity than the compared methods. Moreover, the results have shown that our method has better diversity, i.e., it provides various explanations that achieve different trade-offs between multiple conflicting explanation quality metrics. Therefore, it can provide tailored explanations to different stakeholders based on their specific requirements.

Feature Attribution Explanation to Detect Harmful Dataset Shift.

Automatic dataset shift identification to support root cause analysis of AI performance drift

Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation

LP-Explain: Local Pictorial Explanation for Outliers.

Adversarial Learning for Feature Shift Detection and Correction

Change Detection for Local Explainability in Evolving Data Streams

Efficient and Multiply Robust Risk Estimation under General Forms of Dataset Shift

Explaining Algorithmic Fairness Through Fairness-Aware Causal Path Decomposition

A Unifying Causal Framework for Analyzing Dataset Shift-stable Learning Algorithms

Multi-objective Feature Attribution Explanation For Explainable Machine Learning

Sequential Harmful Shift Detection Without Labels

Detection and Evaluation of bias-inducing Features in Machine learning

Supervised Algorithmic Fairness in Distribution Shifts: A Survey

FIXC: A Method for Data Distribution Shifts Calibration via Feature Importance

Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts

Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests

An Explainable Feature Selection Approach for Fair Machine Learning

Discriminative Feature Attributions: Bridging Post Hoc Explainability and Inherent Interpretability

MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts

On Formal Feature Attribution and Its Approximation

Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution