Synthetic Counterfactual Faces

Guruprasad V Ramesh,Harrison Rosenberg,Ashish Hooda,Shimaa Ahmed Kassem Fawaz
2024-07-30
Abstract:Computer vision systems have been deployed in various applications involving biometrics like human faces. These systems can identify social media users, search for missing persons, and verify identity of individuals. While computer vision models are often evaluated for accuracy on available benchmarks, more annotated data is necessary to learn about their robustness and fairness against semantic distributional shifts in input data, especially in face data. Among annotated data, counterfactual examples grant strong explainability characteristics. Because collecting natural face data is prohibitively expensive, we put forth a generative AI-based framework to construct targeted, counterfactual, high-quality synthetic face data. Our synthetic data pipeline has many use cases, including face recognition systems sensitivity evaluations and image understanding system probes. The pipeline is validated with multiple user studies. We showcase the efficacy of our face generation pipeline on a leading commercial vision model. We identify facial attributes that cause vision systems to fail.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issues of fairness and robustness in computer vision systems when dealing with facial data. Specifically, the researchers focus on how to evaluate the performance of these systems across different populations and their sensitivity to changes in the semantic distribution of input data. To tackle these issues, the authors propose a method for generating high-quality, targeted counterfactual synthetic facial data. This method leverages generative artificial intelligence techniques to construct images with specific facial features, simulating various real-world scenarios. In this way, researchers can create a set of facial images that reflect a wide range of demographic characteristics and facial attributes. The main contributions of the paper include: 1. Proposing an end-to-end pipeline that combines text-to-image diffusion models with distortion and attribute detectors to generate high-quality synthetic facial counterfactual images. 2. Validating the effectiveness of the generated images through multiple user studies. 3. Using the generated dataset to conduct counterfactual evaluations of commercial image understanding systems, particularly Instagram's Android image understanding model. During the research, the authors considered some limitations of existing methods, such as the quality of generated images, bias issues, and the failure to fully edit according to instructions, and proposed corresponding solutions. Ultimately, they generated a dataset containing 15,000 images, covering 8 demographic groups and 19 different facial attributes. By evaluating Instagram's Android image understanding model, the study found performance differences between different gender and ethnic groups. This indicates that even in commercial systems, there are still fairness and robustness issues that need to be further addressed.