Abstract:In the era of Artificial Intelligence Generated Content (AIGC), conditional multimodal synthesis technologies (e.g., text-to-image, text-to-video, text-to-audio, etc) are gradually reshaping the natural content in the real world. The key to multimodal synthesis technology is to establish the mapping relationship between different modalities. Brain signals, serving as potential reflections of how the brain interprets external information, exhibit a distinctive One-to-Many correspondence with various external modalities. This correspondence makes brain signals emerge as a promising guiding condition for multimodal content synthesis. Brian-conditional multimodal synthesis refers to decoding brain signals back to perceptual experience, which is crucial for developing practical brain-computer interface systems and unraveling complex mechanisms underlying how the brain perceives and comprehends external stimuli. This survey comprehensively examines the emerging field of AIGC-based Brain-conditional Multimodal Synthesis, termed AIGC-Brain, to delineate the current landscape and future directions. To begin, related brain neuroimaging datasets, functional brain regions, and mainstream generative models are introduced as the foundation of AIGC-Brain decoding and analysis. Next, we provide a comprehensive taxonomy for AIGC-Brain decoding models and present task-specific representative work and detailed implementation strategies to facilitate comparison and in-depth analysis. Quality assessments are then introduced for both qualitative and quantitative evaluation. Finally, this survey explores insights gained, providing current challenges and outlining prospects of AIGC-Brain. Being the inaugural survey in this domain, this paper paves the way for the progress of AIGC-Brain research, offering a foundational overview to guide future work.

A survey on multimodal-guided visual content synthesis

Multimodal Image Synthesis and Editing: The Generative AI Era

Multimodal Image Synthesis and Editing: A Survey and Taxonomy

An Unsupervised Video Summarization Method Based on Multimodal Representation.

Deep Vision Multimodal Learning: Methodology, Benchmark, and Trend

New Ideas and Trends in Deep Multimodal Content Understanding: A Review

Multi3D: 3D-Aware Multimodal Image Synthesis

A Survey on Audio Synthesis and Audio-Visual Multimodal Processing

Bi-Modality Medical Image Synthesis Using Semi-Supervised Sequential Generative Adversarial Networks

Vision+X: A Survey on Multimodal Learning in the Light of Data

Retrieving Multimodal Information for Augmented Generation: A Survey

A Comprehensive Survey on 3D Content Generation

A Survey of Cross-Modality Brain Image Synthesis

A Survey of Multimodal Composite Editing and Retrieval

AI-Generated Content (AIGC) for Various Data Modalities: A Survey

CMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation

CMOS-GAN: Semi-Supervised Generative Adversarial Model for Cross-Modality Face Image Synthesis

Cross-Modality Neuroimage Synthesis: A Survey

Brain-Conditional Multimodal Synthesis: A Survey and Taxonomy

LLMs Meet Multimodal Generation and Editing: A Survey