Towards Understanding Unsafe Video Generation

Yan Pang,Aiping Xiong,Yang Zhang,Tianhao Wang
2024-07-17
Abstract:Video generation models (VGMs) have demonstrated the capability to synthesize high-quality output. It is important to understand their potential to produce unsafe content, such as violent or terrifying videos. In this work, we provide a comprehensive understanding of unsafe video generation. First, to confirm the possibility that these models could indeed generate unsafe videos, we choose unsafe content generation prompts collected from 4chan and Lexica, and three open-source SOTA VGMs to generate unsafe videos. After filtering out duplicates and poorly generated content, we created an initial set of 2112 unsafe videos from an original pool of 5607 videos. Through clustering and thematic coding analysis of these generated videos, we identify 5 unsafe video categories: Distorted/Weird, Terrifying, Pornographic, Violent/Bloody, and Political. With IRB approval, we then recruit online participants to help label the generated videos. Based on the annotations submitted by 403 participants, we identified 937 unsafe videos from the initial video set. With the labeled information and the corresponding prompts, we created the first dataset of unsafe videos generated by VGMs. We then study possible defense mechanisms to prevent the generation of unsafe videos. Existing defense methods in image generation focus on filtering either input prompt or output results. We propose a new approach called Latent Variable Defense (LVD), which works within the model's internal sampling process. LVD can achieve 0.90 defense accuracy while reducing time and computing resources by 10x when sampling a large number of unsafe prompts.
Cryptography and Security,Artificial Intelligence,Computer Vision and Pattern Recognition,Computers and Society
What problem does this paper attempt to address?
The paper primarily explores the capabilities and understanding of Video Generation Models (VGMs) in producing potentially harmful content and proposes a defense mechanism to prevent the generation of such content. The core issue of the paper is to understand and address the problem of VGMs generating unsafe videos, such as those containing violence, terror, or pornographic content. The researchers began by collecting unsafe content generation prompts from the 4chan and Lexica websites, as well as using three open-source state-of-the-art VGMs, to create an initial collection of unsafe videos. After filtering and analysis, they identified five categories of unsafe videos: distorted/weird, terror, pornographic, violent/bloody, and politically related. Subsequently, these videos were labeled by online participants to further confirm their unsafety, and a labeled dataset containing 937 unsafe videos was created. To address the generation of unsafe videos, the paper proposes the Latent Variable Defense (LVD) method, a defense strategy that works during the internal sampling process of the model, unlike existing input filtering or output result filtering methods. LVD leverages the deterministic properties within diffusion models, detecting whether the generated videos are unsafe by analyzing intermediate results. Experiments show that LVD performs excellently in detection accuracy and efficiency, able to increase defense accuracy to 0.90 when faced with a large number of unsafe prompts, while reducing time and computational resource consumption by a factor of 10. Moreover, LVD also achieved near-perfect accuracy in tests against adversarial prompts and in image-to-video diffusion models, demonstrating its versatility and ability to work in conjunction with other defense mechanisms. In summary, the paper not only demonstrates the potential for VGMs to generate unsafe content but also proposes an effective defense framework, LVD, aimed at ensuring that VGMs do not produce harmful content, thereby contributing to the safe development of artificial intelligence.