'Since Lawyers are Males..': Examining Implicit Gender Bias in Hindi Language Generation by LLMs

Ishika Joshi,Ishita Gupta,Adrita Dey,Tapan Parikh
2024-09-20
Abstract:Large Language Models (LLMs) are increasingly being used to generate text across various languages, for tasks such as translation, customer support, and education. Despite these advancements, LLMs show notable gender biases in English, which become even more pronounced when generating content in relatively underrepresented languages like Hindi. This study explores implicit gender biases in Hindi text generation and compares them to those in English. We developed Hindi datasets inspired by WinoBias to examine stereotypical patterns in responses from models like GPT-4o and Claude-3 sonnet. Our results reveal a significant gender bias of 87.8% in Hindi, compared to 33.4% in English GPT-4o generation, with Hindi responses frequently relying on gender stereotypes related to occupations, power hierarchies, and social class. This research underscores the variation in gender biases across languages and provides considerations for navigating these biases in generative AI systems.
Computation and Language,Artificial Intelligence,Human-Computer Interaction
What problem does this paper attempt to address?
The paper aims to explore the implicit gender bias exhibited by large language models (LLMs) when generating Hindi text and to conduct a comparative study with gender bias in English. Specifically, the authors developed two test datasets specifically for Hindi—HinStereo-100 and HEAStereo-50. These datasets were adapted from existing bias detection frameworks (such as WinoBias) to capture the unique linguistic structures of Hindi. Using these datasets, researchers tested models such as GPT-4o and Claude-3sonnet. The study found that gender bias in Hindi-generated text is significantly higher than in English (87.8% vs 33.4%), and this bias is often related to professions, power hierarchies, and social classes. For example, when dealing with gender-stereotyped professions (such as doctors and nurses), the models tend to generate text based on gender stereotypes. Additionally, the research explored how gendered grammatical features in Hindi affect model performance and how these biases are amplified in multilingual settings. In conclusion, the study reveals differences in gender bias across languages and highlights the limitations of existing debiasing methods when dealing with highly gendered languages like Hindi. This provides important insights for the future development of more inclusive and culturally sensitive AI systems.