DeepFake Detection with Inconsistent Head Poses: Reproducibility and Analysis

Kevin Lutz,Robert Bassett
DOI: https://doi.org/10.48550/arXiv.2108.12715
2021-08-29
Abstract:Applications of deep learning to synthetic media generation allow the creation of convincing forgeries, called DeepFakes, with limited technical expertise. DeepFake detection is an increasingly active research area. In this paper, we analyze an existing DeepFake detection technique based on head pose estimation, which can be applied when fake images are generated with an autoencoder-based face swap. Existing literature suggests that this method is an effective DeepFake detector, and its motivating principles are attractively simple. With an eye towards using these principles to develop new DeepFake detectors, we conduct a reproducibility study of the existing method. We conclude that its merits are dramatically overstated, despite its celebrated status. By investigating this discrepancy we uncover a number of important and generalizable insights related to facial landmark detection, identity-agnostic head pose estimation, and algorithmic bias in DeepFake detectors. Our results correct the current literature's perception of state of the art performance for DeepFake detection.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate and analyze the effectiveness and reproducibility of a DeepFake detection technique based on head pose estimation. Specifically, the author focuses on the performance of this method when detecting DeepFake images generated by face - swapping with autoencoders. Although existing literature considers this method to be an effective DeepFake detector and its principle is simple and easy to understand, through in - depth analysis and reproducibility research on this method, the author finds that its actual effect has been seriously overestimated. The author reveals several important insights, which are related to facial landmark detection, identity - independent head pose estimation, and algorithmic bias in DeepFake detectors. In addition, the author also points out several incorrect assumptions in the original method, such as the utility of head pose estimation as a feature, the accuracy problem of head pose estimation itself, etc., and proposes a simple correction method to avoid the local minimum problem. Although this contribution fails to solve the performance problem of this method, it may be practical in other situations where head pose estimation is required. Overall, this paper aims to correct the misunderstandings in the current literature about the state - of - the - art in DeepFake detection and provide a more accurate assessment.