Abstract:In daily communication, a speaker's voice usually carries a particular emotion. Emotional information is transmitted in two ways: prosody and the semantic content of speech. Previous studies have found that emotional prosody has the effect of releasing auditory masking. The purpose of the present study is a), to test whether the emotional semantic content also has the effect of releasing speech from informational masking, and if so, b) to explore what is the difference between the role of emotional prosody and emotional content in releasing informational masking. This study consisted of two experiments, each divided into two sub-experiments. A perceived spatial separation paradigm was adopted in all experiments to separate the effects of informational masking from that of energetic masking. Experiment 1 explored the mechanism of emotional prosody in the unmasking of informational masking. A complete within-subject design of 2 (perceived spatial separation: no, have) x 2 (emotional prosody: neutral, happy) x 4 (signal-to-noise ratio: -8 dB, -4 dB, 0 dB, 4 dB) was adopted in both sub-experiments. Experiment 1a employed time- reversed sentences with no semantic intelligibility as masking sounds (with presumed only perceptual informational masking). Experiment 1b used syntactically correct nonsense sentences as masking sounds (with both perceptual and cognitive informational masking). Experiment 2 also contained two sub-experiments; it aimed to examine the role of the emotional semantics of speech in releasing informational masking. A complete within- subject design of 2 (perceived spatial separation: no, have) x 2 (emotional semantics: neutral, positive) x 4 (signal-to-noise ratio: -8 dB, -4 dB, 0 dB, 4 dB) was adopted in both sub- experiments. Experiment 2a employed time- reversed sentences with no semantic intelligibility as masking sounds. Experiment 2b used syntactically correct nonsense sentences as masking sounds. Experiment 1a showed that the accuracy of recognition of the target sentence uttered in emotional prosody was significantly higher than that of the target sentence uttered in neutral prosody. Experiment 1b showed that the accuracy of recognition of the target sentence uttered in emotional prosody was significantly higher than that of the target sentence uttered in neutral prosody. There was a marginally significant difference between the results of Experiment 1a and Experiment 1b. Experiment 2a showed no significant difference in recognition accuracy between target sentences with emotional semantics and those with neutral semantics. Experiment 2b showed that the recognition accuracy of target sentences with emotional semantics was significantly higher than that of target sentences with neutral semantics. The study found no significant difference between Experiments 2a and 2b. In conclusion, the results of the present study suggest that the mechanisms of emotional prosody and emotional semantics is different in releasing speech from informational masking. Emotional prosody of speech can preferentially attract more attention from listeners and reduce perceptual informational masking, but it only has a minor effect on releasing cognitive informational masking. The emotional semantics of speech can preferentially occupy more cognitive processing resources of listeners. Hence, it can reduce the cognitive informational masking; however, it fails to release the perceptual informational masking.

Speech- Synchronized Visual Cues Release Speech from Informational Masking

Temporally Pre-Presented Lipreading Cues Release Speech from Informational Masking.

The Role of Visual Cues Indicating Onset Times of Target Speech Syllables in Release from Informational or Energetic Masking

Robust cortical encoding of slow temporal modulations of speech.

Disappearance of the Unmasking Effect of Temporally Pre-Presented Lipreading Cues on Speech Recognition in People with Chronic Schizophrenia

The Effect of Voice Cuing on Releasing Chinese Speech from Informational Masking

Voice-Associated Static Face Image Releases Speech from Informational Masking

Modulation of the voice‐cuing effect on releasing speech from informational masking

Effect of Speech Rate on Speech-on-speech Masking

Combined Manipulations of the Perceived Location and Spatial Extent of the Speech-Target Image Predominantly Affect Speech-on-speech Masking

Emotionally Conditioning the Target-Speech Voice Enhances Recognition of the Target Speech under “Cocktail-Party” Listening Conditions

Auditory Priming Releases Chinese Speech from Informational Masking

The effects of temporal cues, point-light displays, and faces on speech identification and listening effort

Attentional Modulation of the Early Cortical Representation of Speech Signals in Informational or Energetic Masking

Informational masking of speech in children: auditory-visual integration

Unmasking Effects of Speech Emotional Prosody and Semantics on Auditory Informational Masking

Informational masking of speech produced by speech-like sounds without linguistic content.

The Ability of Temporally Integrating Acoustic Waveforms is Associated with Release of Speech from Informational Masking under Reverberant Conditions

The Brain Network Mechanisms Underlying Perceptual Unmasking Cue-Induced Improvement of Speech Recognition under Cocktail-Party Listening Conditions

The Effect of Voice Cuing on Releasing Speech from Informational Masking Disappears in Older Adults

Speaking Rhythmically Improves Speech Recognition under "Cocktail-Party" Conditions