Cross-Modal Attention for Multimodal Information Fusion: A Novel Approach to Attention Deficit Hyperactivity Disorder Detection

Rajesh Nair,S. M. Naqvi,Christian Nash
DOI: https://doi.org/10.23919/FUSION59988.2024.10706381
2024-07-08
Abstract:This paper presents a novel method for differentiating Attention Deficit Hyperactivity Disorder subjects from control participants by multimodal data fusion, including video observations and questionnaire responses. By exploiting the well known Video Vision Transformer model, we analyse the video modality to identify the complex spatial-temporal information of ADHD symptoms. Simultaneously, a Multi-Layer Perceptron model is applied to evaluate structured questionnaire data by capturing key cognitive and emotional indicators of the ADHD symptoms. To fuse the two modalities, a cross-modal attention mechanism assigns adaptive weights to each feature based on its classification relevance. The targeted weighting significantly refines the proposed model’s decision-making capability by concentrating on the most critical elements of the aggregated information. For training and testing, our novel Multimodal ADHD dataset recorded under the Intelligent Sensing ADHD Trial in collaboration with Cumbria, Northumberland, Tyne and Wear NHS Foundation Trust UK is evaluated. The proposed model, ADViQ-AL achieves a 98.18% classification accuracy, 97.83% sensitivity, and 98.53% specificity in classifying ADHD and control groups.
Psychology,Computer Science
What problem does this paper attempt to address?