DistilDIRE: A Small, Fast, Cheap and Lightweight Diffusion Synthesized Deepfake Detection

Yewon Lim,Changyeon Lee,Aerin Kim,Oren Etzioni
2024-06-03
Abstract:A dramatic influx of diffusion-generated images has marked recent years, posing unique challenges to current detection technologies. While the task of identifying these images falls under binary classification, a seemingly straightforward category, the computational load is significant when employing the "reconstruction then compare" technique. This approach, known as DIRE (Diffusion Reconstruction Error), not only identifies diffusion-generated images but also detects those produced by GANs, highlighting the technique's broad applicability. To address the computational challenges and improve efficiency, we propose distilling the knowledge embedded in diffusion models to develop rapid deepfake detection models. Our approach, aimed at creating a small, fast, cheap, and lightweight diffusion synthesized deepfake detector, maintains robust performance while significantly reducing operational demands. Maintaining performance, our experimental results indicate an inference speed 3.2 times faster than the existing DIRE framework. This advance not only enhances the practicality of deploying these systems in real-world settings but also paves the way for future research endeavors that seek to leverage diffusion model knowledge.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem this paper attempts to address is the computational inefficiency of current deepfake detection techniques when dealing with images generated by diffusion models. Specifically, existing detection methods such as DIRE (Diffusion Reconstruction Error) can effectively identify diffusion-generated images, but their computational process is very time-consuming, making it difficult to deploy efficiently in practical applications. To tackle this challenge, the paper proposes a new method called DistilDIRE, which uses knowledge distillation techniques to extract key information from pre-trained diffusion models and develop a small, fast, cost-effective, and lightweight deepfake detection model. This method not only maintains high performance but also achieves an inference speed 3.2 times faster than existing frameworks while significantly reducing computational requirements. This makes the model more suitable for large-scale real-world applications, especially when handling large amounts of input data such as deepfake videos.