The Influence of Multisensory Input On Voice Perception and Production Using Immersive Virtual Reality
Ümit Daşdöğen,Shaheen N Awan,Pasquale Bottalico,Aquiles Iglesias,Nancy Getchell,Katherine Verdolini Abbott
DOI: https://doi.org/10.1016/j.jvoice.2023.07.026
2023-09-20
Abstract:Objectives: The purpose was to examine the influence of auditory vs visual vs combined audiovisual input on perception and production of one's own voice, using immersive virtual reality technology. Methods: Thirty-one vocally healthy men and women were investigated under 18 sensory input conditions, using immersive virtual reality technology. Conditions included two auditory rooms with varying reverberation times, two visual rooms with varying volumes, and the combination of audiovisual conditions. All conditions were repeated with and without background noise. Speech tasks included counting, sustained vowel phonation, an all-voiced sentence from the Consensus Auditory-Perceptual Evaluation of Voice, and the first sentence from the Rainbow Passage, randomly ordered. Perception outcome measures were participants' self-reported perceptions of their vocal loudness, vocal effort, and vocal comfort in speech. Production outcome measures were sound pressure level (SPL) and spectral moments (spectral mean and standard deviation in Hz, skewness, and kurtosis). Statistical analyses used self-reported vocal effort, vocal loudness, and vocal comfort in percent (0 = "not at all," 100 = extremely), SPL in dB, and spectral moments in Hz. The reference level was a baseline audiovisual deprivation condition. Results: Results suggested (i) increased self-perceived vocal loudness and effort, and decreased comfort, with increasing room volume, speaker-to-listener distance, audiovisual input, and background noise, and (ii) increased SPL and fluctuations in spectral moments across conditions. Conclusions: Not only auditory, but also visual and audiovisual input influenced voice perception and production in ways that have not been previously documented. Findings contribute to the basic science understanding the role of visual, audiovisual and auditory input in voice perception and production, and also to models of voice training and therapy. The findings also set the foundation for the use of virtual reality in voice and speech training, as a potentially power solution to the generalization problem.