NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation

Jingyang Huo,Yikai Wang,Xuelin Qian,Yun Wang,Chong Li,Jianfeng Feng,Yanwei Fu
2024-07-18
Abstract:Recent fMRI-to-image approaches mainly focused on associating fMRI signals with specific conditions of pre-trained diffusion models. These approaches, while producing high-quality images, capture only a limited aspect of the complex information in fMRI signals and offer little detailed control over image creation. In contrast, this paper proposes to directly modulate the generation process of diffusion models using fMRI signals. Our approach, NeuroPictor, divides the fMRI-to-image process into three steps: i) fMRI calibrated-encoding, to tackle multi-individual pre-training for a shared latent space to minimize individual difference and enable the subsequent multi-subject training; ii) fMRI-to-image multi-subject pre-training, perceptually learning to guide diffusion model with high- and low-level conditions across different individuals; iii) fMRI-to-image single-subject refining, similar with step ii but focus on adapting to particular individual. NeuroPictor extracts high-level semantic features from fMRI signals that characterizing the visual stimulus and incrementally fine-tunes the diffusion model with a low-level manipulation network to provide precise structural instructions. By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity, particularly in the within-subject setting, as evidenced in benchmark datasets. Our code and model are available at <a class="link-external link-https" href="https://jingyanghuo.github.io/neuropictor/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
This paper aims to address the problem of accurately reconstructing images from functional magnetic resonance imaging (fMRI) signals. Specifically, existing methods primarily focus on associating fMRI signals with specific conditions of pre-trained diffusion models. Although these methods can generate high-quality images, they can only capture limited information from fMRI signals and lack detailed control in image creation. To address these issues, this paper proposes the NeuroPictor framework, which improves the fMRI-to-image reconstruction process through multi-subject pre-training and multi-level modulation. NeuroPictor is divided into three steps: 1. **fMRI Calibration Encoding**: Establishing a universal fMRI latent space through multi-subject pre-training to minimize individual differences. 2. **Multi-Subject Pre-Training**: Utilizing approximately 67,000 fMRI-image pairs from different individuals for pre-training to guide the learning of the diffusion model. 3. **Single-Subject Refinement**: Further fine-tuning for specific individuals based on multi-subject pre-training to enhance individual specificity. The core of NeuroPictor lies in its ability to not only extract high-level semantic features from fMRI signals but also provide precise structural instructions through low-level network manipulation, thereby achieving high-quality reconstruction from fMRI signals to images. Experimental results show that NeuroPictor demonstrates superior performance on multiple benchmark datasets, especially in intra-subject settings.