Abstract:Background: In the past few years there has been a growing interest in the employment of verbal productions as digital biomarkers, namely objective, quantifiable behavioural data that can be collected and measured by means of digital devices, allowing for a low-cost pathology detection, classification and monitoring. Numerous research papers have been published on the automatic detection of subtle verbal alteration, starting from written texts, raw speech recordings and transcripts, and such linguistic analysis has been singled out as a cost-effective method for diagnosing dementia and other medical conditions common among elderly patients (e.g., cognitive dysfunctions associated with metabolic disorders, dysarthria). Aims: To provide a critical appraisal and synthesis of evidence concerning the application of natural language processing (NLP) techniques for clinical purposes in the geriatric population. In particular, we discuss the state of the art on studying language in healthy and pathological ageing, focusing on the latest research efforts to build non-intrusive language-based tools for the early identification of cognitive frailty due to dementia. We also discuss some challenges and open problems raised by this approach. Methods & procedures: We performed a scoping review to examine emerging evidence about this novel domain. Potentially relevant studies published up to November 2021 were identified from the databases of MEDLINE, Cochrane and Web of Science. We also browsed the proceedings of leading international conferences (e.g., ACL, COLING, Interspeech, LREC) from 2017 to 2021, and checked the reference lists of relevant studies and reviews. Main contribution: The paper provides an introductory, but complete, overview of the application of NLP techniques for studying language disruption due to dementia. We also suggest that this technique can be fruitfully applied to other medical conditions (e.g., cognitive dysfunctions associated with dysarthria, cerebrovascular disease and mood disorders). Conclusions & implications: Despite several critical points need to be addressed by the scientific community, a growing body of empirical evidence shows that NLP techniques can represent a promising tool for studying language changes in pathological aging, with a high potential to lead a significant shift in clinical practice. What this paper adds: What is already known on this subject Speech and languages abilities change due to non-pathological neurocognitive ageing and neurodegenerative processes. These subtle verbal modifications can be measured through NLP techniques and used as biomarkers for screening/diagnostic purposes in the geriatric population (i.e., digital linguistic biomarkers-DLBs). What this paper adds to existing knowledge The review shows that DLBs can represent a promising clinical tool, with a high potential to spark a major shift to dementia assessment in the elderly. Some challenges and open problems are also discussed. What are the potential or actual clinical implications of this work? This methodological review represents a starting point for clinicians approaching the DLB research field for studying language in healthy and pathological ageing. It summarizes the state of the art and future research directions of this novel approach.

Data modelling in corpus linguistics: How low may we go?

Tracking the evolution of literary style via Dirichlet–multinomial change point regression

"More than words" - Longitudinal linguistic changes in the works of a writer diagnosed with semantic dementia

Quantifying Cognitive Factors in Lexical Decline

Language patterns in Japanese patients with Alzheimer disease: A machine learning approach

A Longitudinal Multi-modal Dataset for Dementia Monitoring and Diagnosis

Automatic Identification of Alzheimer's Disease using Lexical Features extracted from Language Samples

A Data-Oriented Model of Literary Language

A Systematic Review of NLP for Dementia- Tasks, Datasets and Opportunities

Natural language processing techniques for studying language in pathological ageing: A scoping review

Speech Analysis by Natural Language Processing Techniques: A Possible Tool for Very Early Detection of Cognitive Decline?

Crossing the “Cookie Theft” Corpus Chasm: Applying What BERT Learns From Outside Data to the ADReSS Challenge Dementia Detection Task

Lexical retrieval in discourse: An early indicator of Alzheimer’s dementia

Assessing the Linguistic Capacity Across Alzheimer's Disease and Its Preclinical Stages: Evidence from Narrative Macrostructure in Elderly Speakers of Greek

Detecting Linguistic Characteristics of Alzheimer's Dementia by Interpreting Neural Models

Artificial Intelligence, speech and language processing approaches to monitoring Alzheimer's Disease: a systematic review

Linguistic changes in neurodegenerative diseases relate to clinical symptoms

A curious case of entropic decay: Persistent complexity in textual cultural heritage

Computational analyses of the topics, sentiments, literariness, creativity and beauty of texts in a large Corpus of English Literature

Identification of clinical disease trajectories in neurodegenerative disorders with natural language processing