Abstract:Neural encoding, a crucial aspect to understand the human brain information processing system, aims to establish a quantitative relationship between the stimuli and the evoked brain activities. In the field of visual neuroscience, with the ability to explain how neurons in the primary visual cortex work, population receptive field (pRF) models have enjoyed high popularity and made reliable progress in recent years. However, existing models rely on either the inflexible prior assumptions about pRF or the clumsy parameter estimation methods, severely limiting the expressiveness and interpretability. In this article, we propose a novel neural encoding framework by learning "what" and "where" with deep neural networks. It involves two separate aspects: 1) the spatial characteristic ("where") and 2) feature selection ("what") of neuron populations in the visual cortex. Specifically, our approach first encodes visual stimuli into hierarchically intermediate features through a pretrained deep neural network (DNN), then converts DNN features into refined features with the channel attention and spatial receptive field (RF) to learn "where", and finally regresses refined features simultaneously onto voxel activities to learn "what". The sparsity regularization and smoothness regularization are adopted in our modeling approach so that the crucial RF can be estimated automatically without prior assumptions about shapes. Furthermore, an attempt is made to extend the voxel-wise modeling approach to multi-voxel joint encoding models, and we show that it is conducive to rescuing voxels with poor signal-to-noise characteristics. Extensive empirical results demonstrate that the method developed herein provides an effective strategy to establish neural encodin- for the human visual cortex, with the weaker prior constraints but the higher encoding performance.

What problem does this paper attempt to address?

The paper attempts to address two major challenges in establishing neural encoding models in the human visual cortex: 1. **Limitations of Prior Assumptions**: Existing neural encoding models (such as the population receptive field model) typically rely on inflexible prior assumptions about the spatial properties of receptive fields, such as assuming isotropic Gaussian topologies and ignoring potential inhibitory regions. These assumptions limit the effectiveness and interpretability of the models. 2. **Cumbersome Parameter Estimation Methods**: Current methods often rely on grid search to set search parameters, which can lead to mislocalization of the receptive field center and miscalculation of its size. These methods require a significant amount of manual effort and are often suboptimal. To address these issues, the paper proposes a new neural encoding framework that simultaneously learns "what" (feature selection) and "where" (spatial properties) through deep neural networks. Specifically, the method includes the following steps: - **Nonlinear Feature Extraction**: Using a pre-trained deep neural network (such as AlexNet) to extract hierarchical visual features from images. - **Nonlinear Feature Refinement**: Transforming the raw features into refined features through a channel attention mechanism and a spatial receptive field module. The channel attention mechanism is used to select important feature maps, while the spatial receptive field module determines important locations in visual information processing. - **Voxel-Level Linear Mapping**: Regressing the refined features to voxel activities simultaneously to learn which features are most important for predicting the activity of each voxel. Additionally, the method employs sparse regularization and smooth regularization to automatically estimate receptive fields without the need for strong prior assumptions. Through these techniques, the method enhances the model's expressiveness and interpretability, and demonstrates superior performance compared to other neural encoding models in experiments.

Neural Encoding for Human Visual Cortex With Deep Neural Networks Learning “What” and “Where”

Neural encoding and interpretation for high-level visual cortices based on fMRI using image caption features

A visual encoding model based on deep neural networks and transfer learning

A Temporal Encoding Method Based on Expansion Representation

Neural Encoding and Decoding with Deep Learning for Dynamic Natural Vision

Exploring the Brain-like Properties of Deep Neural Networks: A Neural Encoding Perspective

Research on Neural Encoding Models for Biological Vision: Progress and Challenges

Decoding dynamic visual scenes across the brain hierarchy

GaborNet Visual Encoding: A Lightweight Region-Based Visual Encoding Model With Good Expressiveness and Biological Interpretability

Robust Transcoding Sensory Information with Neural Spikes

Neural encoding with unsupervised spiking convolutional neural network

Enhancing neural encoding models for naturalistic perception with a multi-level integration of deep neural networks and cortical networks

Hierarchical Spiking-Based Model for Efficient Image Classification with Enhanced Feature Extraction and Encoding.

Unidirectional brain-computer interface: Artificial neural network encoding natural images to fMRI response in the visual cortex

Neural Decoding of Visual Information Across Different Neural Recording Modalities and Approaches

Structurally-constrained encoding framework using a multi-voxel reduced-rank latent model for human natural vision

A Brain-Inspired Spiking Neural Network Model with Temporal Encoding and Learning

Dynamics Based Neural Encoding with Inter-Intra Region Connectivity

Decoding Neural Responses in Mouse Visual Cortex through a Deep Neural Network

Category Decoding of Visual Stimuli From Human Brain Activity Using a Bidirectional Recurrent Neural Network to Simulate Bidirectional Information Flows in Human Visual Cortices

Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models