Abstract:Background: Story recall is a simple and sensitive cognitive test that is commonly used to measure changes in episodic memory function in early Alzheimer disease (AD). Recent advances in digital technology and natural language processing methods make this test a candidate for automated administration and scoring. Multiple parallel test stimuli are required for higher-frequency disease monitoring. Objective: This study aims to develop and validate a remote and fully automated story recall task, suitable for longitudinal assessment, in a population of older adults with and without mild cognitive impairment (MCI) or mild AD. Methods: The "Amyloid Prediction in Early Stage Alzheimer's disease" (AMYPRED) studies recruited participants in the United Kingdom (AMYPRED-UK: NCT04828122) and the United States (AMYPRED-US: NCT04928976). Participants were asked to complete optional daily self-administered assessments remotely on their smart devices over 7 to 8 days. Assessments included immediate and delayed recall of 3 stories from the Automatic Story Recall Task (ASRT), a test with multiple parallel stimuli (18 short stories and 18 long stories) balanced for key linguistic and discourse metrics. Verbal responses were recorded and securely transferred from participants' personal devices and automatically transcribed and scored using text similarity metrics between the source text and retelling to derive a generalized match score. Group differences in adherence and task performance were examined using logistic and linear mixed models, respectively. Correlational analysis examined parallel-forms reliability of ASRTs and convergent validity with cognitive tests (Logical Memory Test and Preclinical Alzheimer's Cognitive Composite with semantic processing). Acceptability and usability data were obtained using a remotely administered questionnaire. Results: Of the 200 participants recruited in the AMYPRED studies, 151 (75.5%)-78 cognitively unimpaired (CU) and 73 MCI or mild AD-engaged in optional remote assessments. Adherence to daily assessment was moderate and did not decline over time but was higher in CU participants (ASRTs were completed each day by 73/106, 68.9% participants with MCI or mild AD and 78/94, 83% CU participants). Participants reported favorable task usability: infrequent technical problems, easy use of the app, and a broad interest in the tasks. Task performance improved modestly across the week and was better for immediate recall. The generalized match scores were lower in participants with MCI or mild AD (Cohen d=1.54). Parallel-forms reliability of ASRT stories was moderate to strong for immediate recall (mean rho 0.73, range 0.56-0.88) and delayed recall (mean rho=0.73, range=0.54-0.86). The ASRTs showed moderate convergent validity with established cognitive tests. Conclusions: The unsupervised, self-administered ASRT task is sensitive to cognitive impairments in MCI and mild AD. The task showed good usability, high parallel-forms reliability, and high convergent validity with established cognitive tests. Remote, low-cost, low-burden, and automatically scored speech assessments could support diagnostic screening, health care, and treatment monitoring.

Automated scoring of the autobiographical interview with natural language processing

Semi-automated transcription and scoring of autobiographical memory narratives

A natural language model to automate scoring of autobiographical memories

Machine-learning as a validated tool to characterize individual differences in free recall of naturalistic events

Quantitative text feature analysis of autobiographical interview data: prediction of episodic details, semantic details and temporal discounting

Pulmonary involvement in angio-immunoblastic lymphadenopathy (A.I.L.D.). Case report and review of literature.

Automatic Scoring of Dream Reports' Emotional Content with Large Language Models

Validation of a Remote and Fully Automated Story Recall Task to Assess for Early Cognitive Impairment in Older Adults: Longitudinal Case-Control Observational Study

Towards the development of an automated robotic storyteller: comparing approaches for emotional story annotation for non-verbal expression via body language

Evaluating Web-Based Automatic Transcription for Alzheimer Speech Data: Transcript Comparison and Machine Learning Analysis

Automated Assessment of Encouragement and Warmth in Classrooms Leveraging Multimodal Emotional Features and ChatGPT

Automating PTSD Diagnostics in Clinical Interviews: Leveraging Large Language Models for Trauma Assessments

Measuring Latent Trust Patterns in Large Language Models in the Context of Human-AI Teaming

Clinically indicated replacement versus routine replacement of peripheral venous catheters in adults: A nonblinded, cluster‐randomized trial in China

Leveraging Narrative Feedback in Programmatic Assessment: The Potential of Automated Text Analysis to Support Coaching and Decision-Making in Programmatic Assessment

Automating clinical assessments of memory deficits: Deep Learning based scoring of the Rey-Osterrieth Complex Figure

Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency

A Comparison of Natural Language Processing Methods for Automated Coding of Motivational Interviewing

Evaluating the Efficacy of AI-Based Interactive Assessments Using Large Language Models for Depression Screening

Validation of an Automated Speech Analysis of Cognitive Tasks within a Semiautomated Phone Assessment

Beyond rating scales: With targeted evaluation, large language models are poised for psychological assessment