Pretrained Transformers Do not Always Improve Robustness

Swaroop Mishra,Bhavdeep Singh Sachdeva,Chitta Baral
DOI: https://doi.org/10.48550/arXiv.2210.07663
2022-10-14
Computation and Language
Abstract:Pretrained Transformers (PT) have been shown to improve Out of Distribution (OOD) robustness than traditional models such as Bag of Words (BOW), LSTMs, Convolutional Neural Networks (CNN) powered by Word2Vec and Glove embeddings. How does the robustness comparison hold in a real world setting where some part of the dataset can be noisy? Do PT also provide more robust representation than traditional models on exposure to noisy data? We perform a comparative study on 10 models and find an empirical evidence that PT provide less robust representation than traditional models on exposure to noisy data. We investigate further and augment PT with an adversarial filtering (AF) mechanism that has been shown to improve OOD generalization. However, increase in generalization does not necessarily increase robustness, as we find that noisy data fools the AF method powered by PT.
What problem does this paper attempt to address?