Textual Training for the Hassle-Free Removal of Unwanted Visual Data: Case Studies on OOD and Hateful Image Detection

Saehyung Lee,Jisoo Mok,Sangha Park,Yongho Shin,Dahuin Jung,Sungroh Yoon
2024-10-24
Abstract:In our study, we explore methods for detecting unwanted content lurking in visual datasets. We provide a theoretical analysis demonstrating that a model capable of successfully partitioning visual data can be obtained using only textual data. Based on the analysis, we propose Hassle-Free Textual Training (HFTT), a streamlined method capable of acquiring detectors for unwanted visual content, using only synthetic textual data in conjunction with pre-trained vision-language models. HFTT features an innovative objective function that significantly reduces the necessity for human involvement in data annotation. Furthermore, HFTT employs a clever textual data synthesis method, effectively emulating the integration of unknown visual data distribution into the training process at no extra cost. The unique characteristics of HFTT extend its utility beyond traditional out-of-distribution detection, making it applicable to tasks that address more abstract concepts. We complement our analyses with experiments in out-of-distribution detection and hateful image detection. Our codes are available at <a class="link-external link-https" href="https://github.com/Saehyung-Lee/HFTT" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the **detection problem of unwanted content** in large - scale visual datasets, especially the challenges faced when training large - scale AI models. Specifically, the author focuses on how to efficiently identify and remove out - of - distribution (OOD) samples and harmful content (such as hate images) in visual datasets. These problems are particularly prominent in the era of large - scale AI because: 1. **Large - scale datasets**: As the scale of deep neural networks and training datasets continues to expand, it becomes impractical to manually screen and remove unwanted content. 2. **High cost of data annotation**: Traditional supervised - learning - based methods require a large amount of manual annotation, which is not only time - consuming but may also lead to ethical problems, especially when dealing with sensitive content (such as hate speech). 3. **Limitations of existing methods**: Existing OOD detection methods usually rely on clearly defined distribution boundaries, which are not applicable to abstract concepts (such as hate content). To solve the above problems, the author proposes a new method named **Hassle - Free Textual Training (HFTT)**. The core idea of HFTT is to use **only text data** to train a model that can detect unwanted visual content. The advantages of this method are: - **No need for additional visual data**: By using pre - trained vision - language models (VLMs), such as CLIP, HFTT can be trained without additional visual data. - **Reduced human intervention**: By introducing innovative objective functions and text data synthesis methods, HFTT significantly reduces the need for manual annotation. - **Applicable to abstract concepts**: HFTT is not only applicable to traditional OOD detection tasks but can also be extended to more complex scenarios, such as hate - image detection. ### Summary The main contributions of this paper include: 1. Theoretically prove that text data can replace visual data for training models. 2. Propose a new loss function, eliminating the need for manual annotation of OOD data. 3. Introduce an efficient text - data - synthesis method to simulate the entire visual - data distribution. 4. Verify the effectiveness of HFTT in multiple tasks through experiments, including traditional OOD detection and hate - image detection. These contributions make HFTT a lightweight and efficient method for detecting unwanted content in large - scale visual datasets.