Towards Real World Debiasing: A Fine-grained Analysis On Spurious Correlation

Zhibo Wang,Peng Kuang,Zhixuan Chu,Jingyi Wang,Kui Ren
2024-05-30
Abstract:Spurious correlations in training data significantly hinder the generalization capability of machine learning models when faced with distribution shifts in real-world scenarios. To tackle the problem, numerous debias approaches have been proposed and benchmarked on datasets intentionally designed with severe biases. However, it remains to be asked: \textit{1. Do existing benchmarks really capture biases in the real world? 2. Can existing debias methods handle biases in the real world?} To answer the questions, we revisit biased distributions in existing benchmarks and real-world datasets, and propose a fine-grained framework for analyzing dataset bias by disentangling it into the magnitude and prevalence of bias. We observe and theoretically demonstrate that existing benchmarks poorly represent real-world biases. We further introduce two novel biased distributions to bridge this gap, forming a nuanced evaluation framework for real-world debiasing. Building upon these results, we evaluate existing debias methods with our evaluation framework. Results show that existing methods are incapable of handling real-world biases. Through in-depth analysis, we propose a simple yet effective approach that can be easily applied to existing debias methods, named Debias in Destruction (DiD). Empirical results demonstrate the superiority of DiD, improving the performance of existing methods on all types of biases within the proposed evaluation framework.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the issue of insufficient generalization ability of machine learning models when facing distribution shifts in the real world, due to spurious correlations present in the training data. Specifically, the paper focuses on whether existing debiasing benchmark datasets can truly reflect real-world shifts and evaluates whether current debiasing methods can handle real-world shift problems. To answer these questions, the authors propose a fine-grained framework to analyze shifts in datasets, breaking them down into magnitude and prevalence. Through observation and theoretical proof, the authors find that existing benchmark datasets do not well represent real-world shifts. Additionally, the authors introduce two new shift distributions to bridge this gap and form a detailed evaluation framework to assess debiasing effectiveness in the real world. Based on these results, the authors evaluate existing debiasing methods and find that they perform poorly in handling real-world shifts. Ultimately, the authors propose a simple yet effective enhancement method called "Debias in Destruction" (DiD), which can significantly improve the performance of existing methods under various types of shifts.