Co$^2$PT: Mitigating Bias in Pre-trained Language Models through Counterfactual Contrastive Prompt Tuning

Xiangjue Dong,Ziwei Zhu,Zhuoer Wang,Maria Teleki,James Caverlee
2023-10-19
Abstract:Pre-trained Language Models are widely used in many important real-world applications. However, recent studies show that these models can encode social biases from large pre-training corpora and even amplify biases in downstream applications. To address this challenge, we propose Co$^2$PT, an efficient and effective debias-while-prompt tuning method for mitigating biases via counterfactual contrastive prompt tuning on downstream tasks. Our experiments conducted on three extrinsic bias benchmarks demonstrate the effectiveness of Co$^2$PT on bias mitigation during the prompt tuning process and its adaptability to existing upstream debiased language models. These findings indicate the strength of Co$^2$PT and provide promising avenues for further enhancement in bias mitigation on downstream tasks.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to mitigate social biases in pre - trained language models (PLMs). Specifically, existing research shows that these pre - trained models will encode unfair social biases when pre - trained on large - scale text corpora, and may amplify these biases in downstream tasks. For example, in the language modeling task, "She is a nurse" may have a higher conditional probability than "He is a nurse"; in the coreference resolution task, the coreference score between "nurse" and "she" may be higher than that of "he". Considering that natural language processing (NLP) applications such as machine translation systems, resume screening systems, dialogue systems and speech recognition systems are widely used by millions of users around the world, it is crucial to mitigate social biases in these models to avoid making discriminatory predictions or offensive outputs for specific groups. To address this challenge, the paper proposes Co2PT (Counterfactual Contrastive Prompt Tuning), which is an efficient and effective method for mitigating biases during prompt tuning. By using counterfactual contrastive prompt tuning in downstream tasks, Co2PT aims to mitigate the biases of the model during prompt tuning and adapt to existing upstream de - biased language models. Experimental results show that Co2PT effectively mitigates biases in three external bias benchmark tests, demonstrating its ability to mitigate biases in downstream tasks and its flexibility to adapt to existing de - biased language models. These findings not only prove the advantages of Co2PT, but also provide a new direction for further enhancing bias mitigation in downstream tasks.