AVSecure: an Audio-Visual Watermarking Framework for Proactive Deepfake Detection

Bofei Guo,Haoxuan Tai,Guibo Luo,Yuesheng Zhu
DOI: https://doi.org/10.1109/iceiec61773.2024.10561738
2024-01-01
Abstract:The rise of Deepfake technology presents a significant challenge to the integrity of information. Most existing Deepfake detection methods rely on visual artifacts to distinguish between the authentic and manipulated content, but they are unable to cope with unseen tampering method and easily affected by post-processing. Although recent investigations have tried to proactively protect facial images using deep watermarking techniques, more deceptive Deepfakes often incorporate both visual and audio modalities. To address this issue, we propose a novel proactive Deepfake detection framework for both audio and visual modalities by utilizing a unified encoder-decoder architecture to embed audio-visual watermarks. Also, an audiovisual feature encoder is developed to align the audio and visual information. The multi-modal watermarking is designed to embed a watermark as the detection clue in each modality respectively and conduct verification of both modalities together to detect Deepfaked multimedia. By adding a distortion layer between embedding and extracting during training, the embedded watermark is able to be robust against common post-processing operations (e.g., JPEG compression) while remaining sensitive to Deepfake manipulations (e.g., SimSwap) in the water-mark verification. Our experimental results on VidTIMIT have demonstrated that the proposed watermarking framework can effectively detect various advanced Deepfake manipulations and achieve good robustness to different kinds of common distortions compared with passive uni-modal and multi-modal Deepfake detection methods.
What problem does this paper attempt to address?