Better May Not Be Fairer: A Study on Subgroup Discrepancy in Image Classification

Ming-Chang Chiu,Pin-Yu Chen,Xuezhe Ma
2023-09-22
Abstract:In this paper, we provide 20,000 non-trivial human annotations on popular datasets as a first step to bridge gap to studying how natural semantic spurious features affect image classification, as prior works often study datasets mixing low-level features due to limitations in accessing realistic datasets. We investigate how natural background colors play a role as spurious features by annotating the test sets of CIFAR10 and CIFAR100 into subgroups based on the background color of each image. We name our datasets \textbf{CIFAR10-B} and \textbf{CIFAR100-B} and integrate them with CIFAR-Cs.
Computer Science
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to explore the issue of subgroup differences caused by background color as a spurious correlation feature in image classification and proposes a new data augmentation method to mitigate this problem. #### Main Issues: 1. **Subgroup Differences**: Although deep neural networks have achieved human-level accuracy in image classification tasks, their performance varies significantly across different subgroups (e.g., background color). This phenomenon indicates that models rely on spurious correlation features such as background color. 2. **Effectiveness of Data Augmentation**: Existing data augmentation methods (such as AutoAug, Cutout, etc.) have not effectively addressed the issue of inconsistent performance across subgroups. #### Solutions: 1. **Creating Annotated Datasets**: The paper provides two datasets, CIFAR10-B and CIFAR100-B, annotated with background colors to study the impact of natural semantic spurious correlation features on image classification. 2. **Proposing FlowAug**: A new data augmentation method called FlowAug is proposed, which uses a pre-trained generative flow model to decouple the global and local representations of images, thereby generating new images to reduce performance differences between subgroups. 3. **Introducing MacroStd Metric**: A general metric called MacroStd is introduced to quantify the sensitivity of models to spurious correlation features and to validate the effectiveness of FlowAug. Through these methods, the paper aims to improve the robustness and consistency of models across different subgroups.