Abstract:In attempts to produce ML models less reliant on spurious patterns in NLP datasets, researchers have recently proposed curating counterfactually augmented data (CAD) via a human-in-the-loop process in which given some documents and their (initial) labels, humans must revise the text to make a counterfactual label applicable. Importantly, edits that are not necessary to flip the applicable label are prohibited. Models trained on the augmented data appear, empirically, to rely less on semantically irrelevant words and to generalize better out of domain. While this work draws loosely on causal thinking, the underlying causal model (even at an abstract level) and the principles underlying the observed out-of-domain improvements remain unclear. In this paper, we introduce a toy analog based on linear Gaussian models, observing interesting relationships between causal models, measurement noise, out-of-domain generalization, and reliance on spurious signals. Our analysis provides some insights that help to explain the efficacy of CAD. Moreover, we develop the hypothesis that while adding noise to causal features should degrade both in-domain and out-of-domain performance, adding noise to non-causal features should lead to relative improvements in out-of-domain performance. This idea inspires a speculative test for determining whether a feature attribution technique has identified the causal spans. If adding noise (e.g., by random word flips) to the highlighted spans degrades both in-domain and out-of-domain performance on a battery of challenge datasets, but adding noise to the complement gives improvements out-of-domain, it suggests we have identified causal spans. We present a large-scale empirical study comparing spans edited to create CAD to those selected by attention and saliency maps. Across numerous domains and models, we find that the hypothesized phenomenon is pronounced for CAD.

Nuisances via Negativa: Adjusting for Spurious Correlations via Data Augmentation

Robustness to Spurious Correlations Improves Semantic Out-of-Distribution Detection

Robustness to Spurious Correlations via Human Annotations

A Novel Data Augmentation Technique for Out-of-Distribution Sample Detection using Compounded Corruptions

Data Augmentations for Improved (Large) Language Model Generalization

Out of spuriousity: Improving robustness to spurious correlations without group annotations

Implicit Counterfactual Data Augmentation for Robust Learning

Understanding and Mitigating Spurious Correlations in Text Classification with Neighborhood Analysis

Learning Robust Classifiers with Self-Guided Spurious Correlation Mitigation

Relation-based Counterfactual Data Augmentation and Contrastive Learning for Robustifying Natural Language Inference Models

Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck

On the Impact of Spurious Correlation for Out-of-Distribution Detection

Bias Challenges in Counterfactual Data Augmentation

Addressing Discrepancies in Semantic and Visual Alignment in Neural Networks

Stubborn Lexical Bias in Data and Models

ASPIRE: Language-Guided Data Augmentation for Improving Robustness Against Spurious Correlations

Mitigating Backdoor Poisoning Attacks through the Lens of Spurious Correlation

People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection

Beyond Invariance: Test-Time Label-Shift Adaptation for Distributions with "Spurious" Correlations

Explaining The Efficacy of Counterfactually Augmented Data

How Does Counterfactually Augmented Data Impact Models for Social Computing Constructs?