Extraction and Utilization of Excitation Information of Speech: A Review

Sudarsana Reddy Kadiri,Paavo Alku,B. Yegnanarayana
DOI: https://doi.org/10.1109/jproc.2021.3126493
IF: 20.6
2021-12-01
Proceedings of the IEEE
Abstract:Speech production can be regarded as a process where a time-varying vocal tract system (filter) is excited by a time-varying excitation. In addition to its linguistic message, the speech signal also carries information about, for example, the gender and age of the speaker. Moreover, the speech signal includes acoustical cues about several speaker traits, such as the emotional state and the state of health of the speaker. In order to understand the production of these acoustical cues by the human speech production mechanism and utilize this information in speech technology, it is necessary to extract features describing both the excitation and the filter of the human speech production mechanism. While the methods to estimate and parameterize the vocal tract system are well established, the excitation appears less studied. This article provides a review of signal processing approaches used for the extraction of excitation information from speech. This article highlights the importance of excitation information in the analysis and classification of phonation type and vocal emotions, in the analysis of nonverbal laughter sounds, and in studying pathological voices. Furthermore, recent developments of deep learning techniques in the context of extraction and utilization of the excitation information are discussed.
engineering, electrical & electronic
What problem does this paper attempt to address?