MindLDM: Reconstruct Visual Stimuli from Fmri Using Latent Diffusion Model

Junhao Guo,Chanlin Yi,Fali Li,Peng Xu,Yin Tian
DOI: https://doi.org/10.1109/civemsa58715.2024.10586647
2024-01-01
Abstract:Deciphering brain activity evoked by visual stimuli has consistently been a popular pursuit in cognitive neuroscience. Due to the elusive foundations of visual formation, research on reconstructing visual stimuli encounters challenges. With the advancement of deep learning, several studies have successfully reconstructed scenes resembling visual stimuli from functional magnetic resonance imaging (fMRI). However, substantial dissimilarities persist in terms of contour representation. Furthermore, the majority of existing research primarily focuses on within-subject decoding. In this study, we propose a novel approach - MindLDM that permits cross-subject vision reconstruction. It first employs a Masked Autoencoder (MAE) to obtain the latent features of fMRI and align them into the Contrastive Language-Image Pre-Training (CLIP) text feature space. Then, the Very Deep Variational Auto-Encoders (VDVAE) is utilized to get the contour information of the visual input. Finally, a latent diffusion model combined with ControlNet is proposed to reconstruct the visual stimuli. The MindLDM successfully achieves image reconstruction on the publicly available Natural Scenes Dataset, generating images that exhibit a high degree of semantic correlation with the visual stimuli and demonstrate improved restoration of scene details. Quantitative and qualitative results demonstrate the effectiveness of the proposed method. An exhaustive ablation study was also conducted to analyze our framework.
What problem does this paper attempt to address?