MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion

Yizhuo Lu,Changde Du,Qiongyi zhou,Dianpeng Wang,Huiguang He
2023-08-08
Abstract:Reconstructing visual stimuli from brain recordings has been a meaningful and challenging task. Especially, the achievement of precise and controllable image reconstruction bears great significance in propelling the progress and utilization of brain-computer interfaces. Despite the advancements in complex image reconstruction techniques, the challenge persists in achieving a cohesive alignment of both semantic (concepts and objects) and structure (position, orientation, and size) with the image stimuli. To address the aforementioned issue, we propose a two-stage image reconstruction model called MindDiffuser. In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into Stable Diffusion, which yields a preliminary image that contains semantic information. In Stage 2, we utilize the CLIP visual feature decoded from fMRI as supervisory information, and continually adjust the two feature vectors decoded in Stage 1 through backpropagation to align the structural information. The results of both qualitative and quantitative analyses demonstrate that our model has surpassed the current state-of-the-art models on Natural Scenes Dataset (NSD). The subsequent experimental findings corroborate the neurobiological plausibility of the model, as evidenced by the interpretability of the multimodal feature employed, which align with the corresponding brain responses.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem this paper attempts to address is: reconstructing visual stimulus images from human brain activity (recorded via fMRI), specifically achieving precise and controllable reconstruction of both the semantic (concepts and objects) and structural (position, orientation, size) information of the images. Despite progress in complex image reconstruction techniques, there remains a challenge in aligning the reconstructed images with the original stimuli in terms of semantics and structure. To tackle this challenge, the authors propose a two-stage image reconstruction model named MindDiffuser, which aims to combine the advantages of existing methods while overcoming their respective limitations, thereby generating reconstruction results that are both semantically similar and structurally aligned. Specifically, the goals of the paper include: 1. **Proposing a new image reconstruction model**: MindDiffuser, which can effectively integrate semantic information and adjust structural information to achieve high-quality image reconstruction. 2. **Surpassing existing models**: Through a series of detailed quantitative comparisons, demonstrating that the proposed model outperforms the current state-of-the-art models on the Natural Scenes Dataset (NSD). 3. **Adapting to individual differences**: Experiments show that MindDiffuser can adapt to differences in brain signals between different subjects without additional adjustments, further validating the model's effectiveness and generalizability. 4. **Explaining neurobiological rationality**: By visualizing the feature decoding process, providing evidence of the model's rationality and interpretability in neuroscience, showcasing the consistency between multimodal features and corresponding brain responses. Overall, this paper aims to advance brain-machine interface technology by proposing an innovative image reconstruction method and providing new tools and perspectives for understanding the working mechanisms of the human visual system.