Even small correlation and diversity shifts pose dataset-bias issues

Alceu Bissoto,Catarina Barata,Eduardo Valle,Sandra Avila
DOI: https://doi.org/10.1016/j.patrec.2024.01.026
IF: 4.757
2024-02-04
Pattern Recognition Letters
Abstract:Distribution shifts hinder the deployment of deep learning in real-world problems. Distribution shifts appear when train and test data come from different sources, which commonly happens in practice. Despite shifts occurring concurrently in many forms (e.g., correlation and diversity shifts) and intensities, the literature focuses only on severe and isolated shifts. In this work, we propose a comprehensive examination of distribution shifts across different intensity levels, investigating the nuanced impacts of both mild and severe shifts on the learning process and assessing the interplay between correlation and diversity shifts. We train models in three different scenarios considering synthetic and real correlation and diversity shifts, spamming across eight different levels of correlation shift, and evaluate them in both in-distribution and diversity-shifted test sets. Our experiments reveal three major findings: (1) Even small correlation shifts pose dataset-bias issues, presenting a risk of accumulating and combining unaccountable weak biases; (2) Models learn robust features in high- and low-shift scenarios but prefer spurious ones during test regardless; (3) Diversity shift can attenuate the reliance on spurious correlations. Our work has implications for distribution shift research and practice, providing new insights into how models learn and rely on spurious correlations under different types and intensities of shifts.
computer science, artificial intelligence
What problem does this paper attempt to address?