Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts

Xuyang Wu,Yuan Wang,Hsin-Tai Wu,Zhiqiang Tao,Yi Fang
2024-10-17
Abstract:Large vision-language models (LVLMs) have recently achieved significant progress, demonstrating strong capabilities in open-world visual understanding. However, it is not yet clear how LVLMs address demographic biases in real life, especially the disparities across attributes such as gender, skin tone, age and race. In this paper, We empirically investigate visual fairness in several mainstream LVLMs by auditing their performance disparities across demographic attributes using public fairness benchmark datasets (e.g., FACET, UTKFace). Our fairness evaluation framework employs direct and single-choice question prompt on visual question-answering/classification tasks. Despite advancements in visual understanding, our zero-shot prompting results show that both open-source and closed-source LVLMs continue to exhibit fairness issues across different prompts and demographic groups. Furthermore, we propose a potential multi-modal Chain-of-thought (CoT) based strategy for bias mitigation, applicable to both open-source and closed-source LVLMs. This approach enhances transparency and offers a scalable solution for addressing fairness, providing a solid foundation for future bias reduction efforts.
Computation and Language,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the fairness issue of large vision - language models (LVLMs) across different demographic attributes such as gender, skin color, age, and ethnicity. Although these models have made remarkable progress in open - world visual understanding, their ability to handle demographic biases in real - life situations is still unclear. Specifically, the paper aims to evaluate the fairness issues of these models across different demographic attributes by auditing the performance differences of mainstream LVLMs on public fairness benchmark datasets. In addition, the paper also proposes a bias - mitigation strategy based on multi - modal chain - of - thought (CoT) to improve the performance of LVLMs in handling fairness issues. ### Main research objectives: 1. **Evaluate the fairness of LVLMs**: Use public fairness benchmark datasets (such as FACET, UTKFace) to evaluate the performance differences of LVLMs across different demographic attributes. 2. **Propose a bias - mitigation strategy**: Develop a bias - mitigation strategy of multi - modal chain - of - thought (CoT), which is applicable to open - source and closed - source LVLMs, to improve the performance of the model in handling fairness issues. ### Research background: - **Development of LVLMs**: In recent years, LVLMs have made remarkable progress in integrating image and text information, but there are still deficiencies in fairness evaluation. - **Limitations of existing research**: Existing research mainly focuses on using synthetic images to evaluate biases, which may lead to biased evaluation results. Moreover, most studies focus on detecting biases and lack effective bias - mitigation strategies. ### Method overview: - **Dataset construction**: Use FACET and UTKFace datasets, which cover demographic attributes such as gender, skin color, age, and ethnicity. - **Evaluation framework**: Design direct - question prompts and single - choice - question prompts for visual question - answering and classification tasks to evaluate the performance of the model across different demographic attributes. - **Model inference and result formatting**: Generate prediction results through different prompting methods and use encoding functions to format the results for fairness evaluation. - **Evaluation metrics**: Use recall and group disparity (GD) to evaluate the fairness of the model. ### Experimental results: - **Gender differences**: Most LVLMs show obvious biases in the gender attribute, especially tending to the female attribute in direct - question prompts and tending to the male attribute in single - choice - question prompts. - **Skin color and age differences**: The model also shows obvious preferences in skin color and age attributes, tending to lighter skin colors and younger individuals. - **Effect of the bias - mitigation strategy**: The proposed multi - modal chain - of - thought (CoT) strategy shows a certain effect in improving the fairness of the model, but there is still room for improvement. ### Conclusion: Through systematic experimental evaluation, the paper reveals the fairness issues of LVLMs across different demographic attributes and proposes a multi - modal chain - of - thought bias - mitigation strategy. This research provides a basis for further reducing biases in LVLMs in the future.