Detecting Alzheimer's Disease from Continuous Speech Using Language Models.

Zhiqiang Guo,Zhenhua Ling,Yunxia Li
DOI: https://doi.org/10.3233/jad-190452
2019-01-01
Journal of Alzheimer s Disease
Abstract:BACKGROUND:Recently, many studies have been carried out to detect Alzheimer's disease (AD) from continuous speech by linguistic analysis and modeling. However, few of them utilize language models (LMs) to extract linguistic features and to investigate the lexical-level differences between AD and healthy speech.OBJECTIVE:Our goals include obtaining state-of-art performance of automatic AD detection, emphasizing N-gram LMs as powerful tools for distinguishing AD patients' narratives from those of healthy controls, and discovering the differences of lexical usages between AD patients and healthy people.METHOD:We utilize a subset of the DementiaBank corpus, including 242 control samples from 99 control participants and 256 AD samples from 169 "PossibleAD" or "ProbableAD" participants. Baseline models are built through area under curve-based feature selection and using five machine learning algorithms for comparison. Perplexity features are extracted using LMs to build enhanced detection models. Finally, the differences of lexical usages between AD patients and healthy people are investigated by a proportion test based on unigram probabilities.RESULTS:Our baseline model obtains a detection accuracy of 80.7%. This accuracy increases to 85.4% after integrating the perplexity features derived from LMs. Further investigations show that AD patients tend to use more general, less informative, and less accurate words to describe characters and actions than healthy controls.CONCLUSION:The perplexity features extracted by LMs can benefit the automatic AD detection from continuous speech. There exist lexical-level differences between AD and healthy speech that can be captured by statistical N-gram LMs.
What problem does this paper attempt to address?