Fighting Bias with Bias: Promoting Model Robustness by Amplifying Dataset Biases

Yuval Reif,Roy Schwartz
2023-05-30
Abstract:NLP models often rely on superficial cues known as dataset biases to achieve impressive performance, and can fail on examples where these biases do not hold. Recent work sought to develop robust, unbiased models by filtering biased examples from training sets. In this work, we argue that such filtering can obscure the true capabilities of models to overcome biases, which might never be removed in full from the dataset. We suggest that in order to drive the development of models robust to subtle biases, dataset biases should be amplified in the training set. We introduce an evaluation framework defined by a bias-amplified training set and an anti-biased test set, both automatically extracted from existing datasets. Experiments across three notions of bias, four datasets and two models show that our framework is substantially more challenging for models than the original data splits, and even more challenging than hand-crafted challenge sets. Our evaluation framework can use any existing dataset, even those considered obsolete, to test model robustness. We hope our work will guide the development of robust models that do not rely on superficial biases and correlations. To this end, we publicly release our code and data.
Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the issue that Natural Language Processing (NLP) models rely too heavily on shallow cues in the dataset (i.e., dataset bias) during training, which leads to poor performance when the models encounter data without these biases. To improve the robustness and generalization ability of the models, the authors propose a new framework that amplifies the biases in the training set and uses a bias-free test set to evaluate the model's performance. Specifically, the proposed method includes: 1. **Dataset Division**: Dividing the existing dataset into biased instances and anti-biased instances. 2. **Training and Testing**: Training the model on a training set that mainly contains biased instances and evaluating the model's performance on a test set that mainly contains anti-biased instances. 3. **Method Implementation**: Proposing three methods to identify biased and anti-biased instances: - **Dataset Cartography**: Identifying easy-to-learn and hard-to-learn instances by tracking the dynamics during the model training process. - **Partial-input Baselines**: Identifying bias-dependent instances by restricting the model to use only part of the input. - **Minority Examples**: Identifying minority examples that violate common statistical patterns through clustering methods. Through this framework, the authors aim to promote the development of robust models that can overcome subtle biases in the dataset, thereby improving the model's generalization ability. Experimental results show that this new framework challenges the models more effectively than traditional dataset balancing methods, helping to identify and improve the models' shortcomings in handling complex and diverse data.