D-Fence layer: an ensemble framework for comprehensive deepfake detection
Asha S,Vinod P,Irene Amerini,Varun G. Menon
DOI: https://doi.org/10.1007/s11042-024-18130-1
IF: 2.577
2024-01-29
Multimedia Tools and Applications
Abstract:The rapid advancement of deep learning and computer vision technologies has given rise to a concerning class of deceptive media, commonly known as deepfakes. This paper addresses emerging trends in deepfakes, including the creation of hyper-realistic facial manipulations, the incorporation of synthesized human voices, and the addition of fabricated subtitles to video content. To effectively combat these multifaceted deepfake threats, we introduce an ensemble-based deepfake detection framework called the “D-Fence” layer. The D-Fence layer consists of two uni-modal classifiers designed to identify tampered facial and vocal elements, as well as two cross-modal classifiers for interactions between Video-Audio and Audio-Text domains to detect deepfakes across multiple modalities. To evaluate the effectiveness of our framework, we introduce two novel adversarial attacks: the “Bogus-in-the-middle” attack, which strategically inserts counterfeit video frames within authentic sequences, and the “Downsampling attack”, designed to create deceptive audio. A comparative study of the D-Fence layer against various state-of-the-art multi-modal deepfake detection systems is conducted, demonstrating that our ensemble architecture outperforms existing classifiers. Under diverse adversarial conditions, our D-Fence layer achieves an impressive detection accuracy of 92%, showcasing its ability to detect deepfakes efficiently and reliably.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering