Abstract:Mitigating biases in computer vision models is an essential step towards the trustworthiness of artificial intelligence models. Existing bias mitigation methods focus on a small set of predefined biases, limiting their applicability in visual datasets where multiple, possibly unknown biases exist. To address this limitation, we introduce MAVias, an open-set bias mitigation approach leveraging foundation models to discover spurious associations between visual attributes and target classes. MAVias first captures a wide variety of visual features in natural language via a foundation image tagging model, and then leverages a large language model to select those visual features defining the target class, resulting in a set of language-coded potential visual biases. We then translate this set of potential biases into vision-language embeddings and introduce an in-processing bias mitigation approach to prevent the model from encoding information related to them. Our experiments on diverse datasets, including CelebA, Waterbirds, ImageNet, and UrbanCars, show that MAVias effectively detects and mitigates a wide range of biases in visual recognition tasks outperforming current state-of-the-art.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the problem of visual bias in computer vision (CV) models. Specifically, existing bias - mitigation methods mainly focus on a small number of predefined biases, which limits their application in visual datasets that contain multiple, complex, and possibly unknown biases. To address this limitation, the authors introduce a new method - **MA Vias**, which stands for "Mitigate Any Visual Bias".
#### Main problems:
1. **Unknown biases**: In large - scale general CV datasets, such as ImageNet, biases may be difficult to identify and are mostly still unknown because they vary greatly among different classes and are not significant enough to allow the training of bias - proxy models.
2. **Potential biases outside the scope of predefined biases**: For example, in the CelebA dataset, in addition to hair color, there may be other biases, such as clothing styles (e.g., business suits, ties).
3. **Poor representation of predefined biases**: In some cases, biases are simplified to a single label, such as "rural background" in the UrbanCars dataset. However, more detailed descriptors (e.g., paths, trees, forests, fire hydrants, red, etc.) can provide a richer context for modeling biases.
### Solution:
MA Vias identifies and mitigates these open - set biases through the following steps:
1. **Extracting descriptive labels**: Use an image - tagging model to extract descriptive labels from the input image. These labels represent various visual information in the image, including colors, objects, backgrounds, and other features.
2. **Language - driven bias modeling**: Use a large - language model (LLM) to screen out labels that are not related to the target class, thereby determining potential visual biases.
3. **Bias - mitigation framework**: Convert these biases into visual - language embeddings and introduce a projection layer, enabling the main model to learn bias - invariant representations, thereby reducing the impact of biases on predictions.
### Experimental verification:
The authors conducted experiments on multiple datasets, including CelebA, Waterbirds, UrbanCars, and ImageNet9. The results show that MA Vias outperforms existing methods in detecting and mitigating a wide range of visual biases.
### Summary:
MA Vias provides a flexible and extensible solution that can effectively identify and mitigate biases in CV datasets that contain multiple complex and unknown biases.