M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment

Long Nguyen-Phuoc,Renald Gaboriau,Dimitri Delacroix,Laurent Navarro
DOI: https://doi.org/10.5220/0012575100003660
2024-03-14
Abstract:This paper introduces the M&M model, a novel multimodal-multitask learning framework, applied to the AVCAffe dataset for cognitive load assessment (CLA). M&M uniquely integrates audiovisual cues through a dual-pathway architecture, featuring specialized streams for audio and video inputs. A key innovation lies in its cross-modality multihead attention mechanism, fusing the different modalities for synchronized multitasking. Another notable feature is the model's three specialized branches, each tailored to a specific cognitive load label, enabling nuanced, task-specific analysis. While it shows modest performance compared to the AVCAffe's single-task baseline, M\&M demonstrates a promising framework for integrated multimodal processing. This work paves the way for future enhancements in multimodal-multitask learning systems, emphasizing the fusion of diverse data types for complex task handling.
Computer Vision and Pattern Recognition,Multimedia,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The paper attempts to address the issue of how to effectively integrate multimodal data (audio and video) in Cognitive Load Assessment (CLA) and improve the accuracy and robustness of the assessment through a multitask learning framework. Specifically, the paper proposes a new model called M&M (Multimodal-Multitask Model), which addresses the above issues in the following ways: 1. **Multimodal Data Fusion**: The M&M model processes audio and video inputs separately through a dual-path architecture and integrates data from different modalities using a Cross-Modality Multihead Attention Mechanism, thereby achieving a comprehensive capture of cognitive load. 2. **Multitask Learning**: The M&M model includes three specialized branches, each targeting a specific cognitive load label (such as mental demand, effort level, time demand), which allows the model to perform detailed task-specific analysis, improving overall accuracy and robustness. 3. **Compact and Efficient Model Design**: The M&M model aims to provide a compact and efficient AI solution suitable for environments with limited computational resources, while also simplifying deployment in scenarios such as human-computer interaction. Overall, the M&M model provides a new framework for cognitive load assessment by integrating multimodal data and multitask learning, demonstrating its potential in handling complex tasks.