Abstract:We use concept-based interpretable models to mitigate shortcut learning. Existing methods lack interpretability. Beginning with a Blackbox, we iteratively carve out a mixture of interpretable experts (MoIE) and a residual network. Each expert explains a subset of data using First Order Logic (FOL). While explaining a sample, the FOL from biased BB-derived MoIE detects the shortcut effectively. Finetuning the BB with Metadata Normalization (MDN) eliminates the shortcut. The FOLs from the finetuned-BB-derived MoIE verify the elimination of the shortcut. Our experiments show that MoIE does not hurt the accuracy of the original BB and eliminates shortcuts effectively.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of **Shortcut Learning** in deep neural networks (DNN). Specifically, the author focuses on the problem that in real - world scenarios, deep neural network models (referred to as black - box models, Blackbox, BB) rely on some spurious features rather than truly meaningful causal relationships for prediction, resulting in limited generalization ability. #### Challenges of Shortcut Learning - **Definition**: Shortcut learning refers to the situation where a model utilizes some spurious features that have a statistical association with class labels but no actual causal connection during the training process. This learning method can lead to poor performance of the model on test data, especially when these spurious features are absent. - **Impact**: This phenomenon is particularly dangerous in critical application areas (such as medical diagnosis), as it may lead to wrong decisions and inaccurate predictions. #### Deficiencies of Existing Methods Although the existing methods for solving shortcut learning are effective, they generally have the following three main problems: 1. **Difficulty in accurately locating shortcuts**: It is impossible to clearly point out what specific shortcuts the black - box model depends on. 2. **Opaque mechanism**: It is not clear how to eliminate specific shortcuts from the representation of the black - box model. 3. **Difficulty in verification**: There is a lack of reliable methods to verify whether shortcuts have been successfully eliminated. #### Contributions of the Paper To solve the above problems, this paper proposes a new method. It gradually extracts the Mixture of Interpretable Experts (MoIE) using concept - based interpretable models and eliminates shortcuts through the Metadata Normalization (MDN) technique. The specific steps include: 1. **Detection phase**: Use MoIE to extract First Order Logic (FOL) rules from the biased black - box model and identify shortcuts. 2. **Elimination phase**: Treat the detected shortcuts as metadata and fine - tune the black - box model through the MDN layer to remove the influence of shortcuts. 3. **Verification phase**: Re - extract MoIE from the fine - tuned black - box model and generate new FOL rules to verify whether shortcuts have been successfully eliminated. Through this iterative process, the paper not only solves the problem of shortcut learning but also maintains the accuracy of the original black - box model. Experimental results show that this method can effectively eliminate shortcuts on multiple datasets and improve the robustness and generalization ability of the model.

Tackling Shortcut Learning in Deep Neural Networks: An Iterative Approach with Interpretable Models

Shortcut learning in deep neural networks

Learning Concept Credible Models for Mitigating Shortcuts

Patch Shortcuts: Interpretable Proxy Models Efficiently Find Black-Box Vulnerabilities

Navigate Beyond Shortcuts: Debiased Learning through the Lens of Neural Collapse

COMI: COrrect and MItigate Shortcut Learning Behavior in Deep Neural Networks

Shortcut Learning in In-Context Learning: A Survey

Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning

Be Persistent: Towards a Unified Solution for Mitigating Shortcuts in Deep Learning

On the Foundations of Shortcut Learning

Backdoor Defense Via Suppressing Model Shortcuts

Learning Interpretable Models Through Multi-Objective Neural Architecture Search

Navigating the Shortcut Maze: A Comprehensive Analysis of Shortcut Learning in Text Classification by Language Models

Don't blame Dataset Shift! Shortcut Learning due to Gradients and Cross Entropy

Learning Shortcuts: On the Misleading Promise of NLU in Language Models

Explanation is All You Need in Distillation: Mitigating Bias and Shortcut Learning

Interpretable Deep Convolutional Neural Networks via Meta-learning

Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations

Analysis for Abductive Learning and Neural-Symbolic Reasoning Shortcuts

InterpretCC: Intrinsic User-Centric Interpretability through Global Mixture of Experts