Abstract:In our study, we explore methods for detecting unwanted content lurking in visual datasets. We provide a theoretical analysis demonstrating that a model capable of successfully partitioning visual data can be obtained using only textual data. Based on the analysis, we propose Hassle-Free Textual Training (HFTT), a streamlined method capable of acquiring detectors for unwanted visual content, using only synthetic textual data in conjunction with pre-trained vision-language models. HFTT features an innovative objective function that significantly reduces the necessity for human involvement in data annotation. Furthermore, HFTT employs a clever textual data synthesis method, effectively emulating the integration of unknown visual data distribution into the training process at no extra cost. The unique characteristics of HFTT extend its utility beyond traditional out-of-distribution detection, making it applicable to tasks that address more abstract concepts. We complement our analyses with experiments in out-of-distribution detection and hateful image detection. Our codes are available at <a class="link-external link-https" href="https://github.com/Saehyung-Lee/HFTT" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the **detection problem of unwanted content** in large - scale visual datasets, especially the challenges faced when training large - scale AI models. Specifically, the author focuses on how to efficiently identify and remove out - of - distribution (OOD) samples and harmful content (such as hate images) in visual datasets. These problems are particularly prominent in the era of large - scale AI because: 1. **Large - scale datasets**: As the scale of deep neural networks and training datasets continues to expand, it becomes impractical to manually screen and remove unwanted content. 2. **High cost of data annotation**: Traditional supervised - learning - based methods require a large amount of manual annotation, which is not only time - consuming but may also lead to ethical problems, especially when dealing with sensitive content (such as hate speech). 3. **Limitations of existing methods**: Existing OOD detection methods usually rely on clearly defined distribution boundaries, which are not applicable to abstract concepts (such as hate content). To solve the above problems, the author proposes a new method named **Hassle - Free Textual Training (HFTT)**. The core idea of HFTT is to use **only text data** to train a model that can detect unwanted visual content. The advantages of this method are: - **No need for additional visual data**: By using pre - trained vision - language models (VLMs), such as CLIP, HFTT can be trained without additional visual data. - **Reduced human intervention**: By introducing innovative objective functions and text data synthesis methods, HFTT significantly reduces the need for manual annotation. - **Applicable to abstract concepts**: HFTT is not only applicable to traditional OOD detection tasks but can also be extended to more complex scenarios, such as hate - image detection. ### Summary The main contributions of this paper include: 1. Theoretically prove that text data can replace visual data for training models. 2. Propose a new loss function, eliminating the need for manual annotation of OOD data. 3. Introduce an efficient text - data - synthesis method to simulate the entire visual - data distribution. 4. Verify the effectiveness of HFTT in multiple tasks through experiments, including traditional OOD detection and hate - image detection. These contributions make HFTT a lightweight and efficient method for detecting unwanted content in large - scale visual datasets.

Textual Training for the Hassle-Free Removal of Unwanted Visual Data: Case Studies on OOD and Hateful Image Detection

Textual Training for the Hassle-Free Removal of Unwanted Visual Data

HOD: A Benchmark Dataset for Harmful Object Detection

TagFog: Textual Anchor Guidance and Fake Outlier Generation for Visual Out-of-Distribution Detection

T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition

Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion

DiffUHaul: A Training-Free Method for Object Dragging in Images

DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models

Harnessing Out-Of-Distribution Examples via Augmenting Content and Style

Training with the Invisibles: Obfuscating Images to Share Safely for Learning Visual Recognition Models

TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning

SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation

Detecting Tampered Scene Text in the Wild.

Exploiting Web Images for Fine-Grained Visual Recognition by Eliminating Open-Set Noise and Utilizing Hard Examples

Real-Time Object Detection in Occluded Environment with Background Cluttering Effects Using Deep Learning

OAL: Enhancing OOD Detection Using Latent Diffusion

Detecting and Removing Visual Distractors for Video Aesthetic Enhancement

Transfer Learning for Hate Speech Detection in Social Media

TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images

Two Video Data Sets for Tracking and Retrieval of Out of Distribution Objects

Color Histogram Contouring: A New Training-Less Approach to Object Detection