Exploring Pre-trained Speech Model for Articulatory Feature Extraction in Dysarthric Speech Using ASR

Yuqin Lin,Longbiao Wang,Jianwu Dang,Nobuaki Minematsu
DOI: https://doi.org/10.21437/interspeech.2024-665
2024-01-01
Abstract:Most speech technologies are beneficial for normal speakers, but less effective for speakers with dysphonia. Dysarthria is a motor speech disorder, involving some impairments in the process of speech production. Therefore, articulatory information is important for speech techniques for this special group.However, articulatory features are difficult to extract due to challenges in annotating articulation. Recent studies explored phonemic features in Wav2vec 2.0 pretrained speech models and found they carries some articulatory-related information. Based on this investigation, this paper proposes DS-AAFE to extract more accurate articulatory features from the pretrained speech model based phonemic features. In DS-AAFE, partial articulatory features are isolated from phonemic features by being jointly optimized with ASR. Articulatory attribute detection is employed to evaluate the articulatory information in the proposed features, demonstrating a notable enhancement in the accuracy of articulatory attribute detection. Furthermore, experiments on the UASpeech and TORGO dysarthria datasets showed that the proposed features improved the ASR performance for dysarthric speech.
What problem does this paper attempt to address?