Social evaluation of text-to-speech voices by adults and children
Kevin D. Lilley,Ellen Dossey,Michelle Cohn,Cynthia G. Clopper,Laura Wagner,Georgia Zellou
DOI: https://doi.org/10.1016/j.specom.2024.103163
IF: 2.723
2024-12-07
Speech Communication
Abstract:Humans socially evaluate one another based on their speech. They identify one another with social categories and judge one another's personalities. As more people incorporate voice technology into their everyday lives, adults and children interact more with increasingly sophisticated text-to-speech (TTS) voices, such as those of virtual assistants (e.g., Siri, Alexa). The current study examined how adults and children extend patterns of social evaluation of human voices to TTS voices. We investigated the evaluation of TTS voices by adults (N = 99, ages 18-64 years) and children (N = 87, ages 5-13 years) using a social rating task. Participants rated 10 different TTS voices along seven dimensions: friendliness and honesty (solidarity factors), intelligence and wealthiness (status factors), midwestern-ness and age (demographic factors), and robotic-ness. Half of the participants were told that the voices were from humans (N = 94), and half were told that the voices were from devices (N = 92). The results revealed that individual TTS voices were rated differently on each scale, revealing variation in the social evaluation of TTS voices. In addition, correlations were observed between robotic-ness and each of the other dimensions such that more robotic TTS voices were rated as less friendly, less honest, less intelligent, less wealthy, less midwestern, and older than less robotic TTS voices. Ratings by adults and children were largely similar, and no differences were observed as a function of whether the TTS voices were introduced as belonging to humans or devices. These results indicate that humans socially evaluate TTS voices and that their evaluations are driven primarily by how robotic, or non-human-like, the TTS voice sounds.
computer science, interdisciplinary applications,acoustics