Abstract:Background: Recently, rich computational methods that use deep learning or machine learning have been developed using linguistic biomarkers for the diagnosis of early-stage Alzheimer disease (AD). Moreover, some qualitative and quantitative studies have indicated that certain part-of-speech (PoS) features or tags could be good indicators of AD. However, there has not been a systematic attempt to discover the underlying relationships between PoS features and AD. Moreover, there has not been any attempt to quantify the relative importance of PoS features in detecting AD. Objective: Our goal was to disclose the underlying relationship between PoS features and AD, understand whether PoS features are useful in AD diagnosis, and explore which PoS features play a vital role in the diagnosis. Methods: The DementiaBank, containing 1049 transcripts from 208 patients with AD and 243 transcripts from 104 older control individuals, was used. A total of 27 PoS features were extracted from each record. Then, the relationship between AD and each of the PoS features was explored. A transformer-based deep learning model for AD prediction using PoS features was trained. Then, a global explainable artificial intelligence method was proposed and used to discover which PoS features were the most important in AD diagnosis using the transformer-based predictor. A global (model-level) feature importance measure was derived as a summary from the local (example-level) feature importance metric, which was obtained using the proposed causally aware counterfactual explanation method. The unique feature of this method is that it considers causal relations among PoS features and can, hence, preclude counterfactuals that are improbable and result in more reliable explanations. Results: The deep learning-based AD predictor achieved an accuracy of 92.2% and an F1-score of 0.955 when distinguishing patients with AD from healthy controls. The proposed explanation method identified 12 PoS features as being important for distinguishing patients with AD from healthy controls. Of these 12 features, 3 (25%) have been identified by other researchers in previous works in psychology and natural language processing. The remaining 75% (9/12) of PoS features have not been previously identified. We believe that this is an interesting finding that can be used in creating tests that might aid in the diagnosis of AD. Note that although our method is focused on PoS features, it should be possible to extend it to more types of features, perhaps even those derived from other biomarkers, such as syntactic features. Conclusions: The high classification accuracy of the proposed deep learner indicates that PoS features are strong clues in AD diagnosis. There are 12 PoS features that are strongly tied to AD, and because language is a noninvasive and potentially cheap method for detecting AD, this work shows some promising directions in this field.

Semantic Coherence Markers for the Early Diagnosis of the Alzheimer Disease

Semantic coherence markers: The contribution of perplexity metrics

A Digital Language Coherence Marker for Monitoring Dementia

Detecting Linguistic Characteristics of Alzheimer's Dementia by Interpreting Neural Models

Semantic Coherence Dataset: Speech transcripts

Detecting Alzheimer's Disease from Continuous Speech Using Language Models.

Linguistic Features Extracted by GPT-4 Improve Alzheimer's Disease Detection based on Spontaneous Speech

Automated text‐level semantic markers of Alzheimer's disease

Comparing Natural Language Processing Techniques for Alzheimer's Dementia Prediction in Spontaneous Speech

Different Markers of Semantic–Lexical Impairment Allow One to Obtain Different Information on the Conversion from MCI to AD: A Narrative Review of an Ongoing Research Program

Multimodal Deep Learning Models for Detecting Dementia From Speech and Transcripts

Leveraging Large Language Models for Identifying Interpretable Linguistic Markers and Enhancing Alzheimer's Disease Diagnostics

Automated free speech analysis reveals distinct markers of Alzheimer's and frontotemporal dementia

Temporal Integration of Text Transcripts and Acoustic Features for Alzheimer's Diagnosis Based on Spontaneous Speech

Speech Analysis by Natural Language Processing Techniques: A Possible Tool for Very Early Detection of Cognitive Decline?

Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs

Automatic Identification of Alzheimer's Disease using Lexical Features extracted from Language Samples

Revealing the Roles of Part-of-Speech Taggers in Alzheimer Disease Detection: Scientific Discovery Using One-Intervention Causal Explanation

Verbal fluency discrepancies as a marker of the prehippocampal stages of Alzheimer's disease

An approach for assisting diagnosis of Alzheimer's disease based on natural language processing

Automatic analysis of Categorical Verbal Fluency for Mild Cognitive Impartment detection: a non-linear language independent approach