Discover and Cure: Concept-aware Mitigation of Spurious Correlation

Shirley Wu,Mert Yuksekgonul,Linjun Zhang,James Zou
2023-06-05
Abstract:Deep neural networks often rely on spurious correlations to make predictions, which hinders generalization beyond training environments. For instance, models that associate cats with bed backgrounds can fail to predict the existence of cats in other environments without beds. Mitigating spurious correlations is crucial in building trustworthy models. However, the existing works lack transparency to offer insights into the mitigation process. In this work, we propose an interpretable framework, Discover and Cure (DISC), to tackle the issue. With human-interpretable concepts, DISC iteratively 1) discovers unstable concepts across different environments as spurious attributes, then 2) intervenes on the training data using the discovered concepts to reduce spurious correlation. Across systematic experiments, DISC provides superior generalization ability and interpretability than the existing approaches. Specifically, it outperforms the state-of-the-art methods on an object recognition task and a skin-lesion classification task by 7.5% and 9.6%, respectively. Additionally, we offer theoretical analysis and guarantees to understand the benefits of models trained by DISC. Code and data are available at <a class="link-external link-https" href="https://github.com/Wuyxin/DISC" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
This paper attempts to solve the problem that deep neural networks rely on spurious correlations during prediction. Specifically, the author points out that when the model learns certain spurious attributes related to labels in the training environment (for example, associating cats with the background of beds), this will lead to a decline in the model's generalization ability in different environments. For example, a model that associates cats with the background of beds may not be able to correctly identify cats that are not on beds. In order to build more reliable and trustworthy models, it is crucial to eliminate these spurious correlations. However, existing methods lack transparency and it is difficult to provide insights into the mitigation process. Therefore, this paper proposes a new framework - Discover and Cure (DISC), which aims to solve this problem through the following steps: 1. **Discover spurious concepts**: DISC uses human - interpretable concepts to iteratively discover unstable attributes (i.e., spurious attributes) in different environments. Specifically, DISC calculates the concept sensitivity of each concept in different environments to quantify the instability of the concept. For example, in Figure 1(b), DISC identifies "wrinkles" and "bed" as highly spurious concepts. 2. **Intervene in training data**: Based on the discovered spurious concepts, DISC intervenes in the training data to reduce spurious correlations. The specific method is to introduce images with spurious concepts in selected categories to maintain a balanced distribution of spurious concepts in different categories. For example, if "wrinkles" and "bed" are identified as spurious concepts related to the "cat" category, images of these concepts can be used to intervene in the data of the "dog" category, thereby preventing the model from using these spurious concepts for prediction. 3. **Iterative optimization**: DISC iteratively executes the above two steps during the training process, continuously discovers and corrects spurious correlations in the model, thereby gradually improving the generalization ability and reliability of the model. In addition, the author also provides theoretical analysis and guarantees, proving the superiority of the model trained by DISC in generalization ability, and verifies its effectiveness through multiple experiments. The experimental results show that DISC improves by 7.5% and 9.6% respectively over the existing state - of - the - art methods in object recognition tasks and skin lesion classification tasks. In summary, the main contributions of this paper are: - Proposing a new, interpretable framework to discover spurious concepts and effectively mitigate spurious correlations. - Proving the effectiveness of the method through experiments on multiple datasets and revealing insights into how the model overcomes spurious correlations. - Providing theoretical guarantees to ensure that the model trained by DISC has convergence and good generalization ability.