Abstract:Reconstructing visual experience from brain responses measured by functional magnetic resonance imaging (fMRI) is a challenging yet important research topic in brain decoding, especially it has proved more difficult to decode visually similar stimuli, such as faces. Although face attributes are known as the key to face recognition, most existing methods generally ignore how to decode facial attributes more precisely in perceived face reconstruction, which often leads to indistinguishable reconstructed faces. To solve this problem, we propose a novel neural decoding framework called VSPnet (voxel2style2pixel) by establishing hierarchical encoding and decoding networks with disentangled latent representations as media, so that to recover visual stimuli more elaborately. And we design a hierarchical visual encoder (named HVE) to pre-extract features containing both high-level semantic knowledge and low-level visual details from stimuli. The proposed VSPnet consists of two networks: Multi-branch cognitive encoder and style-based image generator. The encoder network is constructed by multiple linear regression branches to map brain signals to the latent space provided by the pre-extracted visual features and obtain representations containing hierarchical information consistent to the corresponding stimuli. We make the generator network inspired by StyleGAN to untangle the complexity of fMRI representations and generate images. And the HVE network is composed of a standard feature pyramid over a ResNet backbone. Extensive experimental results on the latest public datasets have demonstrated the reconstruction accuracy of our proposed method outperforms the state-of-the-art approaches and the identifiability of different reconstructed faces has been greatly improved. In particular, we achieve feature editing for several facial attributes in fMRI domain based on the multiview ( i.e. , visual stimuli and evoked fMRI) latent representations.

MindLDM: Reconstruct Visual Stimuli from Fmri Using Latent Diffusion Model

MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion

Mind-bridge: Reconstructing Visual Images Based on Diffusion Model from Human Brain Activity

Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors

Decoding Realistic Images from Brain Activity with Contrastive Self-supervision and Latent Diffusion

NeuralDiffuser: Controllable fMRI Reconstruction with Primary Visual Feature Guided Diffusion

Natural scene reconstruction from fMRI signals using generative latent diffusion

Visual Image Decoding of Brain Activities Using a Dual Attention Hierarchical Latent Generative Network with Multiscale Feature Fusion

Visual Image Decoding of Brain Activities using a Dual Attention Hierarchical Latent Generative Network with Multi-Scale Feature Fusion

Reconstructing Visual Stimulus Images from EEG Signals Based on Deep Visual Representation Model

Optimized two-stage AI-based Neural Decoding for Enhanced Visual Stimulus Reconstruction from fMRI Data

A novel DRL-guided sparse voxel decoding model for reconstructing perceived images from brain activity

Reconstructing controllable faces from brain activity with hierarchical multiview representations

MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction

MindGPT: Interpreting What You See with Non-invasive Brain Recordings

Reconstruction of Natural Images from Human fMRI Using a Three-Stage Multi-Level Deep Fusion Model

Reconstructing seen image from brain activity by visually-guided cognitive representation and adversarial learning

Controllable Mind Visual Diffusion Model

Deep image reconstruction from human brain activity

Neuro-Vision to Language: Enhancing Brain Recording-based Visual Reconstruction and Language Interaction

Foreground-attention in Neural Decoding: Guiding Loop-Enc-Dec to Reconstruct Visual Stimulus Images from Fmri