Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

Ammarah Hashmi,Sahibzada Adil Shahzad,Chia-Wen Lin,Yu Tsao,Hsin-Min Wang
2024-05-07
Abstract:The emergence of contemporary deepfakes has attracted significant attention in machine learning research, as artificial intelligence (AI) generated synthetic media increases the incidence of misinterpretation and is difficult to distinguish from genuine content. Currently, machine learning techniques have been extensively studied for automatically detecting deepfakes. However, human perception has been less explored. Malicious deepfakes could ultimately cause public and social problems. Can we humans correctly perceive the authenticity of the content of the videos we watch? The answer is obviously uncertain; therefore, this paper aims to evaluate the human ability to discern deepfake videos through a subjective study. We present our findings by comparing human observers to five state-ofthe-art audiovisual deepfake detection models. To this end, we used gamification concepts to provide 110 participants (55 native English speakers and 55 non-native English speakers) with a webbased platform where they could access a series of 40 videos (20 real and 20 fake) to determine their authenticity. Each participant performed the experiment twice with the same 40 videos in different random orders. The videos are manually selected from the FakeAVCeleb dataset. We found that all AI models performed better than humans when evaluated on the same 40 videos. The study also reveals that while deception is not impossible, humans tend to overestimate their detection capabilities. Our experimental results may help benchmark human versus machine performance, advance forensics analysis, and enable adaptive countermeasures.
Computer Vision and Pattern Recognition,Artificial Intelligence,Computers and Society,Machine Learning,Multimedia
What problem does this paper attempt to address?
The paper aims to explore human perception capabilities regarding audio and video deepfakes. With the advancement of artificial intelligence technology, deepfake content is becoming increasingly difficult to distinguish from real content. This not only poses potential threats to individuals and society but also challenges existing information security and media trust. Although there is currently a significant amount of research focused on the automatic detection of deepfakes, there is relatively little research on how humans perceive these fake contents. To fill this research gap, this paper evaluates human ability to identify audio and video deepfakes through a subjective experiment. Specifically, the researchers designed an online platform and invited 110 participants (including 55 native English speakers and 55 non-native English speakers) to watch a series of videos containing both real and fake content and judge their authenticity. Each participant needed to complete two tests, watching 40 videos each time (20 real videos and 20 fake videos), to assess their consistency and accuracy. Additionally, the study compared the performance of five state-of-the-art audio and video deepfake detection models to contrast the differences between human and machine performance in this task. The main findings of the study include: - Participants were able to distinguish real videos from deepfake videos at a level above random chance. - Human observers performed worse than the most advanced AI algorithms in identifying deepfake videos. - Participants had difficulty recognizing specific forms of manipulation (such as audio or video). - The study also explored the impact of factors such as participants' age, native language, and technical proficiency on their ability to identify deepfake videos. Through this research, the authors hope to provide valuable insights for improving cybersecurity measures, advancing forensic analysis, and developing strategies to combat deepfakes.