Layerwise complexity-matched learning yields an improved model of cortical area V2

Nikhil Parthasarathy,Olivier J. Hénaff,Eero P. Simoncelli
2024-07-19
Abstract:Human ability to recognize complex visual patterns arises through transformations performed by successive areas in the ventral visual cortex. Deep neural networks trained end-to-end for object recognition approach human capabilities, and offer the best descriptions to date of neural responses in the late stages of the hierarchy. But these networks provide a poor account of the early stages, compared to traditional hand-engineered models, or models optimized for coding efficiency or prediction. Moreover, the gradient backpropagation used in end-to-end learning is generally considered to be biologically implausible. Here, we overcome both of these limitations by developing a bottom-up self-supervised training methodology that operates independently on successive layers. Specifically, we maximize feature similarity between pairs of locally-deformed natural image patches, while decorrelating features across patches sampled from other images. Crucially, the deformation amplitudes are adjusted proportionally to receptive field sizes in each layer, thus matching the task complexity to the capacity at each stage of processing. In comparison with architecture-matched versions of previous models, we demonstrate that our layerwise complexity-matched learning (LCL) formulation produces a two-stage model (LCL-V2) that is better aligned with selectivity properties and neural activity in primate area V2. We demonstrate that the complexity-matched learning paradigm is responsible for much of the emergence of the improved biological alignment. Finally, when the two-stage model is used as a fixed front-end for a deep network trained to perform object recognition, the resultant model (LCL-V2Net) is significantly better than standard end-to-end self-supervised, supervised, and adversarially-trained models in terms of generalization to out-of-distribution tasks and alignment with human behavior.
Neurons and Cognition,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the following issues: 1. **Early Visual Area Modeling**: Although deep neural networks (DNNs) exhibit human-like capabilities in complex object recognition tasks and can describe neural responses in later stages of the visual cortex well, these networks perform poorly in early visual processing stages (such as V1 and V2 areas). Traditional hand-crafted models or models optimized for coding efficiency perform better in this regard. 2. **Biological Implausibility of Backpropagation**: The gradient backpropagation method used for end-to-end training is generally considered biologically implausible. Therefore, researchers aim to overcome these two limitations through a new self-supervised training method. ### Research Methods and Contributions - **Layerwise Complexity-matched Learning (LCL)**: Researchers propose a new self-supervised training method that trains each layer independently. Specifically, they achieve this by maximizing feature similarity between locally deformed natural image pairs and decorrelating features of different image samples within the same layer. The key is that the deformation magnitude is scaled according to the receptive field size of each layer, thereby matching task complexity with the capabilities of each processing stage. - **Two-Stage Model (LCL-V2)**: Experiments show that the proposed Layerwise Complexity-matched Learning (LCL) formula produces a better two-stage model (LCL-V2) compared to structurally matched previous models. This model better aligns with the selectivity and neural activity characteristics of the primate V2 area. - **Enhanced Biological Consistency**: When using the two-stage model as a fixed frontend to train a deep network for object recognition tasks, the resulting model (LCL-V2Net) outperforms standard end-to-end self-supervised, supervised, and adversarial training models in terms of generalization ability on out-of-distribution tasks and consistency with human behavior. ### Conclusion Through the aforementioned methods, researchers have successfully improved the modeling of early visual areas (particularly the V2 area) and achieved significant enhancements in biological consistency without sacrificing object recognition performance. Additionally, this method demonstrates better generalization ability on out-of-distribution tasks and aligns more closely with human behavior.