Abstract:With the advent of Large Language Models (LLMs) possessing increasingly impressive capabilities, a number of Large Vision-Language Models (LVLMs) have been proposed to augment LLMs with visual inputs. Such models condition generated text on both an input image and a text prompt, enabling a variety of use cases such as visual question answering and multimodal chat. While prior studies have examined the social biases contained in text generated by LLMs, this topic has been relatively unexplored in LVLMs. Examining social biases in LVLMs is particularly challenging due to the confounding contributions of bias induced by information contained across the text and visual modalities. To address this challenging problem, we conduct a large-scale study of text generated by different LVLMs under counterfactual changes to input images. Specifically, we present LVLMs with identical open-ended text prompts while conditioning on images from different counterfactual sets, where each set contains images which are largely identical in their depiction of a common subject (e.g., a doctor), but vary only in terms of intersectional social attributes (e.g., race and gender). We comprehensively evaluate the text produced by different models under this counterfactual generation setting at scale, producing over 57 million responses from popular LVLMs. Our multi-dimensional analysis reveals that social attributes such as race, gender, and physical characteristics depicted in input images can significantly influence the generation of toxic content, competency-associated words, harmful stereotypes, and numerical ratings of depicted individuals. We additionally explore the relationship between social bias in LVLMs and their corresponding LLMs, as well as inference-time strategies to mitigate bias.

What problem does this paper attempt to address?

This paper focuses on the potential social biases in large-scale vision and language models (LVLMs) when processing image inputs. While previous studies have explored the social biases in generated text by large language models (LLMs), there has been relatively less exploration of LVLMs. LVLMs generate text based on textual prompts and image inputs, and are used in applications such as visual question answering and multimodal chat. However, the confusion between information in images and text makes it difficult to identify social biases in LVLMs. To address this issue, the researchers conducted a large-scale study by analyzing the generated text of different LVLMs through counterfactual conditioning of input images. They used highly similar images that only differed in terms of race, gender, and physical characteristics to control variables and isolate the influence of social attributes in image on text generation. The study covered over 57 million responses and analyzed 78,000 generations of GPT-4o. The analysis revealed that social attributes in the input images such as race, gender, and physical characteristics significantly influenced the generation of toxic content, ability-related vocabulary, harmful stereotypes, and numerical ratings of depicted individuals. All evaluated open-source LVLMs exhibited biases to varying degrees, with particularly notable biases towards overweight or Black individuals. Additionally, the study explored the bias relationship between LVLMs and their corresponding LLMs, as well as bias mitigation strategies during inference. Compared to previous research, this work has the following improvements: 1) examining cross-biases across multiple social attributes; 2) using counterfactual images to control for the influence of other image details; 3) expanding the dataset size to provide more diverse generation samples; 4) evaluating biases in multiple dimensions, including toxicity, stereotypes, abilities, and numerical ratings; 5) exploring the bias relationship between LVLMs and LLMs, as well as mitigation strategies during inference. Overall, this paper reveals the potential harmful biases exhibited by LVLMs in large-scale deployments, emphasizing the need for further research to improve the fairness of these models.

Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals

Uncovering Bias in Large Vision-Language Models with Counterfactuals

Probing Intersectional Biases in Vision-Language Models with Counterfactual Examples

SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

Bias and Fairness in Large Language Models: A Survey

How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making?

GenderBias-VL: Benchmarking Gender Bias in Vision Language Models Via Counterfactual Probing

Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts

Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings

Debiasing Multimodal Large Language Models

Examining Gender and Racial Bias in Large Vision-Language Models Using a Novel Dataset of Parallel Images

GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing

Counterfactually Measuring and Eliminating Social Bias in Vision-Language Pre-training Models

Bias Similarity Across Large Language Models

Social Debiasing for Fair Multi-modal LLMs

Cognitive Bias in Decision-Making with LLMs

Large Language Models Portray Socially Subordinate Groups as More Homogeneous, Consistent with a Bias Observed in Humans

"Im not Racist but...": Discovering Bias in the Internal Knowledge of Large Language Models

Debiasing Large Vision-Language Models by Ablating Protected Attribute Representations