Abstract:Deep neural networks often rely on spurious correlations to make predictions, which hinders generalization beyond training environments. For instance, models that associate cats with bed backgrounds can fail to predict the existence of cats in other environments without beds. Mitigating spurious correlations is crucial in building trustworthy models. However, the existing works lack transparency to offer insights into the mitigation process. In this work, we propose an interpretable framework, Discover and Cure (DISC), to tackle the issue. With human-interpretable concepts, DISC iteratively 1) discovers unstable concepts across different environments as spurious attributes, then 2) intervenes on the training data using the discovered concepts to reduce spurious correlation. Across systematic experiments, DISC provides superior generalization ability and interpretability than the existing approaches. Specifically, it outperforms the state-of-the-art methods on an object recognition task and a skin-lesion classification task by 7.5% and 9.6%, respectively. Additionally, we offer theoretical analysis and guarantees to understand the benefits of models trained by DISC. Code and data are available at <a class="link-external link-https" href="https://github.com/Wuyxin/DISC" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

This paper attempts to solve the problem that deep neural networks rely on spurious correlations during prediction. Specifically, the author points out that when the model learns certain spurious attributes related to labels in the training environment (for example, associating cats with the background of beds), this will lead to a decline in the model's generalization ability in different environments. For example, a model that associates cats with the background of beds may not be able to correctly identify cats that are not on beds. In order to build more reliable and trustworthy models, it is crucial to eliminate these spurious correlations. However, existing methods lack transparency and it is difficult to provide insights into the mitigation process. Therefore, this paper proposes a new framework - Discover and Cure (DISC), which aims to solve this problem through the following steps: 1. **Discover spurious concepts**: DISC uses human - interpretable concepts to iteratively discover unstable attributes (i.e., spurious attributes) in different environments. Specifically, DISC calculates the concept sensitivity of each concept in different environments to quantify the instability of the concept. For example, in Figure 1(b), DISC identifies "wrinkles" and "bed" as highly spurious concepts. 2. **Intervene in training data**: Based on the discovered spurious concepts, DISC intervenes in the training data to reduce spurious correlations. The specific method is to introduce images with spurious concepts in selected categories to maintain a balanced distribution of spurious concepts in different categories. For example, if "wrinkles" and "bed" are identified as spurious concepts related to the "cat" category, images of these concepts can be used to intervene in the data of the "dog" category, thereby preventing the model from using these spurious concepts for prediction. 3. **Iterative optimization**: DISC iteratively executes the above two steps during the training process, continuously discovers and corrects spurious correlations in the model, thereby gradually improving the generalization ability and reliability of the model. In addition, the author also provides theoretical analysis and guarantees, proving the superiority of the model trained by DISC in generalization ability, and verifies its effectiveness through multiple experiments. The experimental results show that DISC improves by 7.5% and 9.6% respectively over the existing state - of - the - art methods in object recognition tasks and skin lesion classification tasks. In summary, the main contributions of this paper are: - Proposing a new, interpretable framework to discover spurious concepts and effectively mitigate spurious correlations. - Proving the effectiveness of the method through experiments on multiple datasets and revealing insights into how the model overcomes spurious correlations. - Providing theoretical guarantees to ensure that the model trained by DISC has convergence and good generalization ability.

Discover and Cure: Concept-aware Mitigation of Spurious Correlation

Learning and Exploiting Interclass Visual Correlations for Medical Image Classification

Mitigating Spurious Correlations in Multi-modal Models during Fine-tuning

Learning Robust Classifiers with Self-Guided Spurious Correlation Mitigation

Towards Mitigating more Challenging Spurious Correlations: A Benchmark & New Datasets

Unsupervised Concept Discovery Mitigates Spurious Correlations

Debiasing Counterfactuals In the Presence of Spurious Correlations

Reducing Spurious Correlation for Federated Domain Generalization

SLIM: Spuriousness Mitigation with Minimal Human Annotations

Understanding and Mitigating Spurious Correlations in Text Classification with Neighborhood Analysis

Right for the Wrong Reason: Can Interpretable ML Techniques Detect Spurious Correlations?

Mitigating Spurious Correlations for Self-supervised Recommendation

On the Impact of Spurious Correlation for Out-of-Distribution Detection

Out of spuriousity: Improving robustness to spurious correlations without group annotations

Spurious Features Everywhere -- Large-Scale Detection of Harmful Spurious Features in ImageNet

Mitigating Backdoor Poisoning Attacks through the Lens of Spurious Correlation

DISCO: DISCovering Overfittings as Causal Rules for Text Classification Models

Robustness to Spurious Correlations via Human Annotations

Spuriousness-Aware Meta-Learning for Learning Robust Classifiers

Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort

DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation