Abstract:Language models, characterized by their black-box nature, often hallucinate and display sensitivity to input perturbations, causing concerns about trust. To enhance trust, it is imperative to gain a comprehensive understanding of the model's failure modes and develop effective strategies to improve their performance. In this study, we introduce a methodology designed to examine how input perturbations affect language models across various scales, including pre-trained models and large language models (LLMs). Utilizing fine-tuning, we enhance the model's robustness to input perturbations. Additionally, we investigate whether exposure to one perturbation enhances or diminishes the model's performance with respect to other perturbations. To address robustness against multiple perturbations, we present three distinct fine-tuning strategies. Furthermore, we broaden the scope of our methodology to encompass large language models (LLMs) by leveraging a chain of thought (CoT) prompting approach augmented with exemplars. We employ the Tabular-NLI task to showcase how our proposed strategies adeptly train a robust model, enabling it to address diverse perturbations while maintaining accuracy on the original dataset.

Healing Powers of BERT: How Task-Specific Fine-Tuning Recovers Corrupted Language Models

An Improved Mask Approach Based on Pointer Network for Domain Adaptation of BERT

Can Fine-tuning Pre-trained Models Lead to Perfect NLP? A Study of the Generalizability of Relation Extraction.

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

On Robustness and Bias Analysis of BERT-Based Relation Extraction

InfoBERT: Improving Robustness of Language Models from an Information Theoretic Perspective

Fine-tuning large neural language models for biomedical natural language processing

Towards Evaluating the Robustness of Chinese BERT Classifiers

RoChBert: Towards Robust BERT Fine-tuning for Chinese

BERTwich: Extending BERT's Capabilities to Model Dialectal and Noisy Text

How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?

Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT

Enhancing Model Robustness Via Lexical Distilling

Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!

BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks

How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry'' Benchmark

fairBERTs: Erasing Sensitive Information Through Semantic and Fairness-aware Perturbations

An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models

Breaking BERT: Understanding its Vulnerabilities for Named Entity Recognition through Adversarial Attack

Evaluating Concurrent Robustness of Language Models Across Diverse Challenge Sets

Explorations of Self-Repair in Language Models