Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts

Naseela Pervez,Alexander J. Titus
2024-06-28
Abstract:Large language models (LLMs) are increasingly utilized to assist in scientific and academic writing, helping authors enhance the coherence of their articles. Previous studies have highlighted stereotypes and biases present in LLM outputs, emphasizing the need to evaluate these models for their alignment with human narrative styles and potential gender biases. In this study, we assess the alignment of three prominent LLMs - Claude 3 Opus, Mistral AI Large, and Gemini 1.5 Flash - by analyzing their performance on benchmark text-generation tasks for scientific abstracts. We employ the Linguistic Inquiry and Word Count (LIWC) framework to extract lexical, psychological, and social features from the generated texts. Our findings indicate that, while these models generally produce text closely resembling human authored content, variations in stylistic features suggest significant gender biases. This research highlights the importance of developing LLMs that maintain a diversity of writing styles to promote inclusivity in academic discourse.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper aims to explore the application of large language models (LLMs) in scientific writing and their potential gender bias issues. Specifically, the researchers evaluated three mainstream large language models (Claude 3 Opus, Mistral AI Large, and Gemini 1.5 Flash), analyzing their performance in generating scientific summaries and extracting linguistic, psychological, and social features of the generated text through the Linguistic Inquiry and Word Count (LIWC) framework. The main objectives of the study are: 1. When prompted to rewrite a piece of scientific text, can LLMs maintain the narrative style of the original text, i.e., do they retain the author's personality? 2. Do LLMs exacerbate or mitigate personality traits in scientific texts? Do they enhance positive traits and weaken negative traits? By comparing the differences between human-written and LLM-generated scientific summaries, the study found that although these models can generally generate text similar to human-written content, there are significant gender biases in certain stylistic features. This indicates that developing diverse writing styles is crucial for promoting inclusivity in academic communication.