Abstract:Mild Cognitive Impairment (MCI) is an early stage of memory loss or other cognitive ability loss in individuals who maintain the ability to independently perform most activities of daily living. It is considered a transitional stage between normal cognitive stage and more severe cognitive declines like dementia or Alzheimer's. Based on the reports from the National Institute of Aging (NIA), people with MCI are at a greater risk of developing dementia, thus it is of great importance to detect MCI at the earliest possible to mitigate the transformation of MCI to Alzheimer's and dementia. Recent studies have harnessed Artificial Intelligence (AI) to develop automated methods to predict and detect MCI. The majority of the existing research is based on unimodal data (e.g., only speech or prosody), but recent studies have shown that multimodality leads to a more accurate prediction of MCI. However, effectively exploiting different modalities is still a big challenge due to the lack of efficient fusion methods. This study proposes a robust fusion architecture utilizing an embedding-level fusion via a co-attention mechanism to leverage multimodal data for MCI prediction. This approach addresses the limitations of early and late fusion methods, which often fail to preserve inter-modal relationships. Our embedding-level fusion aims to capture complementary information across modalities, enhancing predictive accuracy. We used the I-CONECT dataset, where a large number of semi-structured conversations via internet/webcam between participants aged 75+ years old and interviewers were recorded. We introduce a multimodal speech-language-vision Deep Learning-based method to differentiate MCI from Normal Cognition (NC). Our proposed architecture includes co-attention blocks to fuse three different modalities at the embedding level to find the potential interactions between speech (audio), language (transcribed speech), and vision (facial videos) within the cross-Transformer layer. Experimental results demonstrate that our fusion method achieves an average AUC of 85.3% in detecting MCI from NC, significantly outperforming unimodal (60.9%) and bimodal (76.3%) baseline models. This superior performance highlights the effectiveness of our model in capturing and utilizing the complementary information from multiple modalities, offering a more accurate and reliable approach for MCI prediction.

Pre-trained Feature Fusion and Matching for Mild Cognitive Impairment Detection

CogniVoice: Multimodal and Multilingual Fusion Networks for Mild Cognitive Impairment Assessment from Spontaneous Speech

Leveraging Multimodal Methods and Spontaneous Speech for Alzheimer's Disease Identification

Automatic detection of Mild Cognitive Impairment using high-dimensional acoustic features in spontaneous speech

A multimodal cross-transformer-based model to predict mild cognitive impairment using speech, language and vision

Multimodal fusion for alzheimer’s disease recognition

Screening for Mild Cognitive Impairment with Speech Interaction Based on Virtual Reality and Wearable Devices

Deep Spatial-Temporal Feature Fusion from Adaptive Dynamic Functional Connectivity for MCI Identification.

Selecting and Analyzing Speech Features for the Screening of Mild Cognitive Impairment

Improving the Assessment of Mild Cognitive Impairment in Advanced Age With a Novel Multi-Feature Automated Speech and Language Analysis of Verbal Fluency

Detection of Mild Cognitive Impairment From Non-Semantic, Acoustic Voice Features: The Framingham Heart Study

Analysis of Disfluencies for automatic detection of Mild Cognitive Impartment: a deep learning approach

Improving Mild Cognitive Impairment Prediction via Reinforcement Learning and Dialogue Simulation

Classification Study of Mild Cognitive Impairment Based on Multi-feature Fusion for Cortical Morphology

Automatic speech analysis for detecting cognitive decline of older adults

Detecting Alzheimer's Disease Based on Acoustic Features Extracted from Pre-trained Models

A deep feature fusion network with global context and cross-dimensional dependencies for classification of mild cognitive impairment from brain MRI

Exploring linguistic feature and model combination for speech recognition based automatic AD detection

A Transfer Learning Method for Detecting Alzheimer's Disease Based on Speech and Natural Language Processing

Dementia Detection by Analyzing Spontaneous Mandarin Speech.

Cross-lingual Alzheimer's Disease detection based on paralinguistic and pre-trained features