Abstract:Reconstructing visual experience from brain responses measured by functional magnetic resonance imaging (fMRI) is a challenging yet important research topic in brain decoding, especially it has proved more difficult to decode visually similar stimuli, such as faces. Although face attributes are known as the key to face recognition, most existing methods generally ignore how to decode facial attributes more precisely in perceived face reconstruction, which often leads to indistinguishable reconstructed faces. To solve this problem, we propose a novel neural decoding framework called VSPnet (voxel2style2pixel) by establishing hierarchical encoding and decoding networks with disentangled latent representations as media, so that to recover visual stimuli more elaborately. And we design a hierarchical visual encoder (named HVE) to pre-extract features containing both high-level semantic knowledge and low-level visual details from stimuli. The proposed VSPnet consists of two networks: Multi-branch cognitive encoder and style-based image generator. The encoder network is constructed by multiple linear regression branches to map brain signals to the latent space provided by the pre-extracted visual features and obtain representations containing hierarchical information consistent to the corresponding stimuli. We make the generator network inspired by StyleGAN to untangle the complexity of fMRI representations and generate images. And the HVE network is composed of a standard feature pyramid over a ResNet backbone. Extensive experimental results on the latest public datasets have demonstrated the reconstruction accuracy of our proposed method outperforms the state-of-the-art approaches and the identifiability of different reconstructed faces has been greatly improved. In particular, we achieve feature editing for several facial attributes in fMRI domain based on the multiview ( i.e. , visual stimuli and evoked fMRI) latent representations.

Reconstructing Rapid Natural Vision with fMRI-Conditional Video Generative Adversarial Network

Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity

Rethinking Visual Reconstruction: Experience-Based Content Completion Guided by Visual Cues

Reconstructing Retinal Visual Images from 3T fMRI Data Enhanced by Unsupervised Learning

Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity

BigGAN-based Bayesian reconstruction of natural images from human brain activity

NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties

Reconstructing seen image from brain activity by visually-guided cognitive representation and adversarial learning

DCNN-GAN: Reconstructing Realistic Image from fMRI

Semantics-Guided Hierarchical Feature Encoding Generative Adversarial Network for Visual Image Reconstruction From Brain Activity

Deep Natural Image Reconstruction from Human Brain Activity Based on Conditional Progressively Growing Generative Adversarial Networks

Fundus to Fluorescein Angiography Video Generation as a Retinal Generative Foundation Model

Cross-Modal Synthesis of Structural MRI and Functional Connectivity Networks via Conditional ViT-GANs

Neural Encoding and Decoding with Deep Learning for Dynamic Natural Vision

MindGPT: Interpreting What You See with Non-invasive Brain Recordings

NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction

Functional Alignment-Auxiliary Generative Adversarial Network-Based Visual Stimuli Reconstruction via Multi-Subject fMRI

Variational Autoencoder: An Unsupervised Model for Modeling and Decoding fMRI Activity in Visual Cortex

May I see what you see? Predicting visual features from neuronal activity

Reconstructing controllable faces from brain activity with hierarchical multiview representations

Reconstructing Natural Images from Human Fmri by Alternating Encoding and Decoding with Shared Autoencoder Regularization