Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review

Clayton Cohn,Eduardo Davalos,Caleb Vatral,Joyce Horn Fonteles,Hanchen David Wang,Meiyi Ma,Gautam Biswas
2024-08-23
Abstract:Recent technological advancements have enhanced our ability to collect and analyze rich multimodal data (e.g., speech, video, and eye gaze) to better inform learning and training experiences. While previous reviews have focused on parts of the multimodal pipeline (e.g., conceptual models and data fusion), a comprehensive literature review on the methods informing multimodal learning and training environments has not been conducted. This literature review provides an in-depth analysis of research methods in these environments, proposing a taxonomy and framework that encapsulates recent methodological advances in this field and characterizes the multimodal domain in terms of five modality groups: Natural Language, Video, Sensors, Human-Centered, and Environment Logs. We introduce a novel data fusion category -- mid fusion -- and a graph-based technique for refining literature reviews, termed citation graph pruning. Our analysis reveals that leveraging multiple modalities offers a more holistic understanding of the behaviors and outcomes of learners and trainees. Even when multimodality does not enhance predictive accuracy, it often uncovers patterns that contextualize and elucidate unimodal data, revealing subtleties that a single modality may miss. However, there remains a need for further research to bridge the divide between multimodal learning and training studies and foundational AI research.
Machine Learning,Multimedia
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to more comprehensively analyze students' behaviors and performances in learning and training environments through multimodal methods, so as to provide more meaningful support and promote students' learning and training effects. Specifically, the paper focuses on the following aspects: 1. **Data Types**: What kinds of data are necessary for understanding learners' behaviors and performances? How can these data support meaningful educational intervention measures? 2. **Multimodal Data Analysis**: How to effectively collect, fuse and analyze data from different modalities (such as natural language, video, sensors, human - centered data and environmental logs) to obtain a more comprehensive understanding of learners' behaviors and results? 3. **Data Fusion Methods**: What are the advantages and disadvantages of existing data fusion methods (early fusion, late fusion, hybrid fusion) in multimodal learning and training environments? Is a new data fusion classification needed to better reflect the current research progress? 4. **Research Methods**: What are the current research methods in multimodal learning and training environments? What challenges and deficiencies do these methods have in data collection, analysis and interpretation? 5. **Future Research Directions**: How to further bridge the gap between multimodal learning and training research and basic artificial intelligence research? What aspects should future research focus on? Through a systematic review of existing literature, the paper proposes a comprehensive framework and classification system, aiming to provide guidance and support for the research of multimodal learning and training environments. Specific contributions include: - Proposing a new data fusion classification - mid fusion, which is between early fusion and late fusion and is suitable for partially processed features. - Introducing a literature screening method based on citation graphs - citation graph pruning, which is used to programmatically screen the corpus of literature reviews. - Providing a detailed classification system, covering five types of modalities (natural language, video, sensors, human - centered data and environmental logs), and analyzing the characteristics and applications of each type of modality. - Outlining the main research methods in multimodal learning and training environments, including classification, regression, clustering, qualitative analysis, statistical analysis, network analysis and pattern extraction, etc. Through these contributions, the paper hopes to provide a comprehensive reference framework for researchers in the field of multimodal learning and training, and promote the further development of this field.