Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity

Yizhuo Lu,Changde Du,Chong Wang,Xuanliu Zhu,Liuyun Jiang,Huiguang He

2024-05-06

Abstract:Reconstructing human dynamic vision from brain activity is a challenging task with great scientific significance. The difficulty stems from two primary issues: (1) vision-processing mechanisms in the brain are highly intricate and not fully revealed, making it challenging to directly learn a mapping between fMRI and video; (2) the temporal resolution of fMRI is significantly lower than that of natural videos. To overcome these issues, this paper propose a two-stage model named Mind-Animator, which achieves state-of-the-art performance on three public datasets. Specifically, during the fMRI-to-feature stage, we decouple semantic, structural, and motion features from fMRI through fMRI-vision-language tri-modal contrastive learning and sparse causal attention. In the feature-to-video stage, these features are merged to videos by an inflated Stable Diffusion. We substantiate that the reconstructed video dynamics are indeed derived from fMRI, rather than hallucinations of the generative model, through permutation tests. Additionally, the visualization of voxel-wise and ROI-wise importance maps confirms the neurobiological interpretability of our model.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the challenging problem of reconstructing dynamic visual stimuli from functional magnetic resonance imaging (fMRI) signals. Specifically, the paper focuses on the following two main issues: 1. **Complex and Not Fully Revealed Visual Processing Mechanisms**: The visual processing mechanisms in the brain are very complex and not yet fully understood, making it difficult to directly learn video mappings from fMRI signals. 2. **Low Temporal Resolution of fMRI**: The temporal resolution of fMRI is significantly lower than that of natural videos, leading to a substantial mismatch in the time dimension. To overcome these issues, the researchers propose a two-stage model called Mind-Animator, which can decouple semantic, structural, and motion information from fMRI signals and generate video frames through an inflated Stable Diffusion model. Additionally, permutation tests were conducted to verify that the motion information in the reconstructed videos indeed originates from the fMRI signals rather than being an "illusion" of the generative model. Finally, the neurobiological interpretability of the model was confirmed through voxel-level and ROI-level importance maps.

Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity

Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity

Rethinking Visual Reconstruction: Experience-Based Content Completion Guided by Visual Cues

Neural Representations of Dynamic Visual Stimuli

Reconstructing Rapid Natural Vision with fMRI-Conditional Video Generative Adversarial Network

NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties

Mind Artist: Creating Artistic Snapshots with Human Thought

Neural Encoding and Decoding with Deep Learning for Dynamic Natural Vision

Mind-bridge: Reconstructing Visual Images Based on Diffusion Model from Human Brain Activity

MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion

NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction

Natural scene reconstruction from fMRI signals using generative latent diffusion

Seeing through the Brain: Image Reconstruction of Visual Perception from Human Brain Signals

MindLDM: Reconstruct Visual Stimuli from Fmri Using Latent Diffusion Model

Deep image reconstruction from human brain activity

Deep Natural Image Reconstruction from Human Brain Activity Based on Conditional Progressively Growing Generative Adversarial Networks

Movie reconstruction from mouse visual cortex activity

Relightable and Animatable Neural Avatar from Sparse-View Video

Reconstructing seen image from brain activity by visually-guided cognitive representation and adversarial learning

Reconstruction of Natural Images from Human fMRI Using a Three-Stage Multi-Level Deep Fusion Model

Brain decoding: toward real-time reconstruction of visual perception