$F_0$-Noise-robust Glottal Source and Vocal Tract Analysis Based on ARX-LF Model

Yongwei Li,Jianhua Tao,Donna Erickson,Bin Liu,Masato Akagi
DOI: https://doi.org/10.1109/taslp.2021.3120585
2021-01-01
IEEE/ACM Transactions on Audio Speech and Language Processing
Abstract:This paper proposes a robust automatic speech analysis method based on a source-filter model constructed of an Auto-Regressive eXogenous (ARX) model and the Liljencrants-Fant (LF) model. The proposed method estimates glottal source waveform and vocal tract shape parameters using an analysis-by-synthesis approach. Structurally, the first step is to initialize the glottal source parameters using the inverse filter method, and the second step is to simultaneously estimate the glottal source waveform and the vocal tract shape parameters using an analysis-by-synthesis approach with an iterative algorithm. The proposed method was verified on synthetic voices with different glottal noise (signal to noise ratio) from 0 dB to 50 dB and different fundamental frequency ( $F_0$ ) from 80 Hz to 320 Hz levels. The results show that the proposed method achieved a much higher estimation accuracy than that of the state-of-the-art inverse filtering methods on both different glottal noise and different $F_0$ levels.
What problem does this paper attempt to address?