Chameleon: Foundation Models for Fairness-Aware Multi-Modal Data Augmentation to Enhance Coverage of Minorities

Mahdi Erfanian,H. V. Jagadish,Abolfazl Asudeh
DOI: https://doi.org/10.14778/3681954.3682014
IF: 2.5
2024-07-01
Proceedings of the VLDB Endowment
Abstract:Potential harms from the under-representation of minorities in data, particularly in multi-modal settings, is a well-recognized concern. While there has been extensive effort in detecting such under-representation, resolution has remained a challenge. With recent generative AI advancements, large language and foundation models have emerged as versatile tools across various domains. In this paper, we propose Chameleon, a system that efficiently utilizes these tools to augment a dataset with minimal addition of synthetically generated tuples to enhance the coverage of the under-represented groups. Our system applies quality and outlier-detection tests to ensure the quality and semantic integrity of the generated tuples. In order to minimize the rejection chance of the generated tuples, we propose multiple strategies to provide a guide for the foundation model. Our experiment results, in addition to confirming the efficiency of our proposed algorithms, illustrate our approach's effectiveness, as the model's unfairness in a downstream task significantly dropped after data repair using Chameleon.
computer science, information systems, theory & methods
What problem does this paper attempt to address?