Abstract:Mild Cognitive Impairment (MCI) is an early stage of memory loss or other cognitive ability loss in individuals who maintain the ability to independently perform most activities of daily living. It is considered a transitional stage between normal cognitive stage and more severe cognitive declines like dementia or Alzheimer's. Based on the reports from the National Institute of Aging (NIA), people with MCI are at a greater risk of developing dementia, thus it is of great importance to detect MCI at the earliest possible to mitigate the transformation of MCI to Alzheimer's and dementia. Recent studies have harnessed Artificial Intelligence (AI) to develop automated methods to predict and detect MCI. The majority of the existing research is based on unimodal data (e.g., only speech or prosody), but recent studies have shown that multimodality leads to a more accurate prediction of MCI. However, effectively exploiting different modalities is still a big challenge due to the lack of efficient fusion methods. This study proposes a robust fusion architecture utilizing an embedding-level fusion via a co-attention mechanism to leverage multimodal data for MCI prediction. This approach addresses the limitations of early and late fusion methods, which often fail to preserve inter-modal relationships. Our embedding-level fusion aims to capture complementary information across modalities, enhancing predictive accuracy. We used the I-CONECT dataset, where a large number of semi-structured conversations via internet/webcam between participants aged 75+ years old and interviewers were recorded. We introduce a multimodal speech-language-vision Deep Learning-based method to differentiate MCI from Normal Cognition (NC). Our proposed architecture includes co-attention blocks to fuse three different modalities at the embedding level to find the potential interactions between speech (audio), language (transcribed speech), and vision (facial videos) within the cross-Transformer layer. Experimental results demonstrate that our fusion method achieves an average AUC of 85.3% in detecting MCI from NC, significantly outperforming unimodal (60.9%) and bimodal (76.3%) baseline models. This superior performance highlights the effectiveness of our model in capturing and utilizing the complementary information from multiple modalities, offering a more accurate and reliable approach for MCI prediction.

Leveraging Multimodal Methods and Spontaneous Speech for Alzheimer's Disease Identification

Multimodal fusion for alzheimer’s disease recognition

Multi-modal fusion with gating using audio, lexical and disfluency features for Alzheimer's Dementia recognition from spontaneous speech

Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs

Early Dementia Detection Using Multiple Spontaneous Speech Prompts: The PROCESS Challenge

A Multimodal Approach for Dementia Detection from Spontaneous Speech with Tensor Fusion Layer

Temporal Integration of Text Transcripts and Acoustic Features for Alzheimer's Diagnosis Based on Spontaneous Speech

Pre-trained Feature Fusion and Matching for Mild Cognitive Impairment Detection

CogniVoice: Multimodal and Multilingual Fusion Networks for Mild Cognitive Impairment Assessment from Spontaneous Speech

Towards Computer-Based Automated Screening of Dementia Through Spontaneous Speech

Multimodal Deep Learning Models for Detecting Dementia From Speech and Transcripts

Comparing Natural Language Processing Techniques for Alzheimer's Dementia Prediction in Spontaneous Speech

A multimodal cross-transformer-based model to predict mild cognitive impairment using speech, language and vision

Cross-lingual Alzheimer's Disease detection based on paralinguistic and pre-trained features

Exploring Multimodal Approaches for Alzheimer's Disease Detection Using Patient Speech Transcript and Audio Data

Multimodal Inductive Transfer Learning for Detection of Alzheimer's Dementia and its Severity

Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection

Analyzing Multimodal Features of Spontaneous Voice Assistant Commands for Mild Cognitive Impairment Detection

Leveraging Pretrained Representations with Task-Related Keywords for Alzheimer’s Disease Detection

Connected Multi-speech Task for Detecting Alzheimer’s Disease Using a Two-Layer Model