Listening for Expert Identified Linguistic Features: Assessment of Audio Deepfake Discernment among Undergraduate Students

Noshaba N. Bhalli,Nehal Naqvi,Chloe Evered,Christine Mallinson,Vandana P. Janeja
2024-11-22
Abstract:This paper evaluates the impact of training undergraduate students to improve their audio deepfake discernment ability by listening for expert-defined linguistic features. Such features have been shown to improve performance of AI algorithms; here, we ascertain whether this improvement in AI algorithms also translates to improvement of the perceptual awareness and discernment ability of listeners. With humans as the weakest link in any cybersecurity solution, we propose that listener discernment is a key factor for improving trustworthiness of audio content. In this study we determine whether training that familiarizes listeners with English language variation can improve their abilities to discern audio deepfakes. We focus on undergraduate students, as this demographic group is constantly exposed to social media and the potential for deception and misinformation online. To the best of our knowledge, our work is the first study to uniquely address English audio deepfake discernment through such techniques. Our research goes beyond informational training by introducing targeted linguistic cues to listeners as a deepfake discernment mechanism, via a training module. In a pre-/post- experimental design, we evaluated the impact of the training across 264 students as a representative cross section of all students at the University of Maryland, Baltimore County, and across experimental and control sections. Findings show that the experimental group showed a statistically significant decrease in their unsurety when evaluating audio clips and an improvement in their ability to correctly identify clips they were initially unsure about. While results are promising, future research will explore more robust and comprehensive trainings for greater impact.
Sound,Computers and Society,Audio and Speech Processing
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: **Can the ability of college students to distinguish audio deepfakes be improved by training them to recognize language features defined by experts?** Specifically, the researchers focus on the following aspects: 1. **Human Perception and Detection of Audio Deepfakes**: - Current research mainly focuses on marking and improving the detection of audio deepfakes through algorithms, while relatively little research has been done on how humans perceive these fake audios. This study aims to fill this gap and explore whether the perception and discrimination ability of listeners can be improved through specific training. 2. **Effectiveness of the Training Module**: - The researchers developed a sociolinguistics - based training module, introducing five expert - defined language features (EDLFs) to students, including pitch, pause, initial or final plosives, breathing sounds, and overall sound quality. Through a pre - test - post - test experimental design, it is evaluated whether this training can significantly improve students' discrimination ability. 3. **Differences among Different Groups**: - The study also explored the differences in training effects among different groups, such as gender differences (male vs. female) and differences in professional backgrounds (computer major vs. non - computer major). The research results show that female students showed a more obvious reduction in uncertainty after training, and non - computer major students improved in the discrimination of all audio clips. ### Main Research Questions - **RQ1**: Can sociolinguistics - oriented training improve students' ability to distinguish audio deepfakes? - **RQ2**: What are the gender differences in training effects? - **RQ3**: What are the differences in training effects among students with different professional backgrounds? ### Methodology The study adopted a pre - test - post - test experimental design, divided into an experimental group and a control group. The experimental group received training on five language features, while the control group only read an introductory article on deepfakes. The research subjects were 264 undergraduates from the University of Maryland, Baltimore County, covering both computer major and non - computer major students. ### Experimental Results - **Overall Results**: The students who received training significantly reduced their selection of uncertain options, indicating that they became more confident when judging the authenticity of audios. - **Gender Differences**: Female students had a significant reduction in uncertainty after training, but this reduction was not always accompanied by an improvement in the accuracy of deepfake discrimination. - **Professional Background Differences**: Non - computer major students improved in the discrimination of all audio clips, while computer major students only performed better in the discrimination of real audios. ### Conclusion The research results show that sociolinguistics - oriented training can effectively improve students' ability to distinguish audio deepfakes, especially in reducing uncertainty and enhancing self - confidence. However, the training effects vary among different groups, and future research needs to further explore more comprehensive and integrated training methods to deal with the increasingly complex deepfake problem.