KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application

Hwaran Lee,Seokhee Hong,Joonsuk Park,Takyoung Kim,Gunhee Kim,Jung-Woo Ha

2023-05-30

Abstract:Large language models (LLMs) learn not only natural text generation abilities but also social biases against different demographic groups from real-world data. This poses a critical risk when deploying LLM-based applications. Existing research and resources are not readily applicable in South Korea due to the differences in language and culture, both of which significantly affect the biases and targeted demographic groups. This limitation requires localized social bias datasets to ensure the safe and effective deployment of LLMs. To this end, we present KO SB I, a new social bias dataset of 34k pairs of contexts and sentences in Korean covering 72 demographic groups in 15 categories. We find that through filtering-based moderation, social biases in generated content can be reduced by 16.47%p on average for HyperCLOVA (30B and 82B), and GPT-3.

Computation and Language

What problem does this paper attempt to address?

The paper aims to address the issue of social bias generated by large language models (LLMs) when producing text. Specifically, since LLMs are trained on real-world data, which often contains social biases against different demographic groups, LLMs also absorb these biases. This poses a significant risk when deploying LLM-based applications. Existing research and resources are not applicable in Korea because language and cultural differences significantly affect the forms and target groups of bias. Therefore, the paper proposes a new social bias dataset called KOSBI, which includes 34,000 pairs of contexts and sentences, covering 72 demographic groups across 15 categories. Through a filtering-based adjustment method, the paper demonstrates how to reduce social bias in generated content, achieving an average reduction of 16.47% in social bias on HyperCLOVA (30B and 82B) and GPT-3. Additionally, the paper analyzes the mitigation effects of social bias across different categories and explores the relationship between the performance of the filtering model and the adjustment effects. Overall, the paper aims to enhance the safety of LLMs by constructing a dataset tailored to Korean culture and language, thereby protecting more people from the impact of social bias.

KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application

Detecting Bias in Large Language Models: Fine-tuned KcBERT

KoMultiText: Large-Scale Korean Text Dataset for Classifying Biased Speech in Real-World Online Services

KoBBQ: Korean Bias Benchmark for Question Answering

Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings

SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration

Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models

MBIAS: Mitigating Bias in Large Language Models While Retaining Context

Bias and Fairness in Large Language Models: A Survey

STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions

Korean Online Hate Speech Dataset for Multilabel Classification: How Can Social Science Improve Dataset on Hate Speech?

Towards Understanding and Mitigating Social Biases in Language Models

Social Debiasing for Fair Multi-modal LLMs

Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis

BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models

Do the Right Thing, Just Debias! Multi-Category Bias Mitigation Using LLMs

Understanding Intrinsic Socioeconomic Biases in Large Language Models

CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models

Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models

Analyzing Social Biases in Japanese Large Language Models