People are poorly equipped to detect AI-powered voice clones

Sarah Barrington,Hany Farid
2024-10-04
Abstract:As generative AI continues its ballistic trajectory, everything from text to audio, image, and video generation continues to improve in mimicking human-generated content. Through a series of perceptual studies, we report on the realism of AI-generated voices in terms of identity matching and naturalness. We find human participants cannot reliably identify short recordings (less than 20 seconds) of AI-generated voices. Specifically, participants mistook the identity of an AI-voice for its real counterpart 80% of the time, and correctly identified a voice as AI-generated only 60% of the time. In all cases, performance is independent of the demographics of the speaker or listener.
Human-Computer Interaction,Artificial Intelligence,Computers and Society,Sound,Audio and Speech Processing
What problem does this paper attempt to address?