NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties

Jingyuan Sun,Mingxiao Li,Zijiao Chen,Marie-Francine Moens
2024-05-12
Abstract:In the pursuit to understand the intricacies of human brain's visual processing, reconstructing dynamic visual experiences from brain activities emerges as a challenging yet fascinating endeavor. While recent advancements have achieved success in reconstructing static images from non-invasive brain recordings, the domain of translating continuous brain activities into video format remains underexplored. In this work, we introduce NeuroCine, a novel dual-phase framework to targeting the inherent challenges of decoding fMRI data, such as noises, spatial redundancy and temporal lags. This framework proposes spatial masking and temporal interpolation-based augmentation for contrastive learning fMRI representations and a diffusion model enhanced by dependent prior noise for video generation. Tested on a publicly available fMRI dataset, our method shows promising results, outperforming the previous state-of-the-art models by a notable margin of ${20.97\%}$, ${31.00\%}$ and ${12.30\%}$ respectively on decoding the brain activities of three subjects in the fMRI dataset, as measured by SSIM. Additionally, our attention analysis suggests that the model aligns with existing brain structures and functions, indicating its biological plausibility and interpretability.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem this paper attempts to address is the reconstruction of vivid videos from human brain activity, specifically functional magnetic resonance imaging (fMRI). While significant progress has been made in recent years in reconstructing static images from non-invasive brain recordings, there is still limited research on translating continuous brain activity into video format. The paper introduces a new two-stage framework called NeuralFlix, which aims to tackle some inherent issues faced when decoding fMRI data, such as noise, spatial redundancy, and temporal lag. Specifically, the main contributions of the paper include: 1. **Two-stage framework**: The first stage enhances fMRI data through spatial masking and temporal interpolation, and trains an optimized fMRI encoder to resist the interference brought by these enhancements. The second stage uses the trained fMRI encoder to guide a video diffusion model to generate videos, while introducing dependency prior noise to compensate for the low signal-to-noise ratio of fMRI data. 2. **Innovative techniques**: The use of contrastive learning methods to learn fMRI representations and the generation of videos through a diffusion model. These techniques help transform complex and noisy fMRI data into precise and meaningful visual reconstructions. 3. **Experimental validation**: Tests conducted on publicly available fMRI datasets show that NeuralFlix significantly outperforms existing state-of-the-art models in decoding brain activity of three subjects, with improvements of 20.97%, 31.00%, and 12.30% respectively (measured by the SSIM metric). 4. **Biological interpretability**: Analysis of the model's attention reveals that its outputs are consistent with known brain structures and functions, further demonstrating the biological plausibility and interpretability of the method. In summary, this paper aims to advance the field of neural decoding and visual reconstruction by efficiently and accurately reconstructing videos from human brain activity through the combination of advanced neuroimaging techniques and machine learning methods.