Abstract:“Big data” is always collected from different resources that have different data structures. With the rapid development of information technologies, current precious data resources are characteristic of multimodes. As a result, based on classical machine learning strategies, multi-modal learning has become a valuable research topic, enabling computers to process and understand “big data”. The cognitive processes of humans involve perception through different sense organs. Signals from eyes, ears, the nose, and hands (tactile sense) constitute a person’s understanding of a special scene or the world as a whole. It reasonable to believe that multi-modal methods involving a higher ability to process complex heterogeneous data can further promote the progress of information technologies. The concepts of multimodality stemmed from psychology and pedagogy from hundreds of years ago and have been popular in computer science during the past decade. In contrast to the concept of “media”, a “mode” is a more fine-grained concept that is associated with a typical data source or data form. The effective utilization of multi-modal data can aid a computer understand a specific environment in a more holistic way. In this context, we first introduced the definition and main tasks of multi-modal learning. Based on this information, the mechanism and origin of multi-modal machine learning were then briefly introduced. Subsequently, statistical learning methods and deep learning methods for multi-modal tasks were comprehensively summarized. We also introduced the main styles of data fusion in multi-modal perception tasks, including feature representation, shared mapping, and co-training. Additionally, novel adversarial learning strategies for cross-modal matching or generation were reviewed. The main methods for multi-modal learning were outlined in this paper with a focus on future research issues in this field.

Recent Advances of Multimodal Continual Learning: A Comprehensive Survey

Deep Vision Multimodal Learning: Methodology, Benchmark, and Trend

Continual Learning of Large Language Models: A Comprehensive Survey

Recent Advances of Foundation Language Models-based Continual Learning: A Survey

Continual Learning Meets Multimodal Foundation Models: Fundamentals and Advances

LLMs Meet Multimodal Generation and Editing: A Survey

A survey of multimodal federated learning: background, applications, and perspectives

Continual Learning with Pre-Trained Models: A Survey

Self-Supervised Multimodal Learning: A Survey

Deep Multimodal Learning with Missing Modality: A Survey

Recent Advances and Trends in Multimodal Deep Learning: A Review

A Survey of Multimodal Machine Learning

Continual learning in medical image analysis: A survey

Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Multimodality in meta-learning: A comprehensive survey

Multimodal Continual Learning Using Online Dictionary Updating.

LLMs Can Evolve Continually on Modality for X-Modal Reasoning

Vision+X: A Survey on Multimodal Learning in the Light of Data

A Survey of Multimodal Large Language Model from A Data-centric Perspective