Useful Blunders: Can Automated Speech Recognition Errors Improve Downstream Dementia Classification?

Changye Li,Weizhe Xu,Trevor Cohen,Serguei Pakhomov
2024-01-11
Abstract:\textbf{Objectives}: We aimed to investigate how errors from automatic speech recognition (ASR) systems affect dementia classification accuracy, specifically in the ``Cookie Theft'' picture description task. We aimed to assess whether imperfect ASR-generated transcripts could provide valuable information for distinguishing between language samples from cognitively healthy individuals and those with Alzheimer's disease (AD).
Computation and Language,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to explore whether the errors made by Automatic Speech Recognition (ASR) systems when generating transcriptions for dementia classification tasks can provide valuable information, thereby improving the accuracy of dementia classification. Specifically, the study focuses on the "Cookie Theft" picture description task, evaluating the performance of imperfect ASR-generated transcriptions compared to professionally manually transcribed texts in distinguishing between language samples from cognitively healthy individuals and those with Alzheimer's Disease (AD). ### Research Background - **Challenges in Diagnosing Alzheimer's Disease (AD)**: AD is a neurodegenerative disease that affects language and speech abilities, making early diagnosis difficult. Lack of or delayed diagnosis can have negative impacts on dementia patients and their caregivers. - **Limitations of Existing Methods**: Current AD diagnostic methods include various information sources such as caregiver reports, structured interviews, and cognitive tests. These methods are time-consuming and need to be conducted in controlled laboratory environments, which may not be sensitive enough to natural language patterns and may fail to capture early language deficits in everyday communication. - **Potential of Automatic Speech Recognition (ASR)**: ASR technology can automatically generate transcriptions of audio recordings, eliminating the time and resource bottlenecks of manual transcription, and providing the possibility for large-scale deployment of language-informed dementia screening methods. However, ASR errors (typically measured by Word Error Rate (WER) and Character Error Rate (CER)) may reduce the classification accuracy of predictive models in identifying dementia. ### Research Objectives - **Evaluate the Impact of ASR Errors**: Study the performance of imperfect ASR-generated transcriptions in dementia classification tasks, particularly in the "Cookie Theft" picture description task. - **Explore the Value of ASR Errors**: Analyze whether ASR errors can provide valuable information to improve the accuracy of dementia classification. - **Improve ASR Models**: Enhance the performance of ASR models through post-editing methods, making them approach previously reported evaluation metrics when handling challenging spontaneous speech recordings generated by dementia patients and healthy individuals. ### Methods - **Datasets**: Use two publicly available datasets: ADReSS and WLS. These datasets contain audio recordings and transcriptions of participants in the "Cookie Theft" picture description task. - **Models**: Study two pre-trained ASR models (Wav2Vec2 and HuBERT) and a BERT model for classifying ASR-generated transcriptions. - **Evaluation**: Evaluate the performance of ASR models using Word Error Rate (WER) and Character Error Rate (CER), and assess the performance of classification models using accuracy and AUC. - **Error Analysis**: Use SHapley Additive exPlanations (SHAP) to analyze the features influencing classification decisions, particularly the contributions of ASR-generated characters, words, and phrases. ### Conclusion - **Main Findings**: Imperfect ASR-generated transcriptions perform well in dementia classification tasks, even surpassing manually transcribed texts. This suggests that ASR errors may contain valuable clues related to dementia. - **Mechanism**: Language deficits in dementia patients may lead to systematic ASR errors, which can be leveraged by classification models to further improve classification performance. - **Future Directions**: The research results provide a foundation for developing ASR models and workflows more suitable for dementia screening, aiming to enhance their performance and applicability in clinical settings.