Abstract:View-invariant object recognition is a challenging problem, which has attracted much attention among the psychology, neuroscience, and computer vision communities. Humans are notoriously good at it, even if some variations are presumably more difficult to handle than others (e.g. 3D rotations). Humans are thought to solve the problem through hierarchical processing along the ventral stream, which progressively extracts more and more invariant visual features. This feed-forward architecture has inspired a new generation of bio-inspired computer vision systems called deep convolutional neural networks (DCNN), which are currently the best algorithms for object recognition in natural images. Here, for the first time, we systematically compared human feed-forward vision and DCNNs at view-invariant object recognition using the same images and controlling for both the kinds of transformation as well as their magnitude. We used four object categories and images were rendered from 3D computer models. In total, 89 human subjects participated in 10 experiments in which they had to discriminate between two or four categories after rapid presentation with backward masking. We also tested two recent DCNNs on the same tasks. We found that humans and DCNNs largely agreed on the relative difficulties of each kind of variation: rotation in depth is by far the hardest transformation to handle, followed by scale, then rotation in plane, and finally position. This suggests that humans recognize objects mainly through 2D template matching, rather than by constructing 3D object models, and that DCNNs are not too unreasonable models of human feed-forward vision. Also, our results show that the variation levels in rotation in depth and scale strongly modulate both humans' and DCNNs' recognition performances. We thus argue that these variations should be controlled in the image datasets used in vision research.

Efficient Rotation Invariance in Deep Neural Networks through Artificial Mental Rotation

Achieving Rotation Invariance in Convolution Operations: Shifting from Data-Driven to Mechanism-Assured

RINet: Efficient 3D Lidar-Based Place Recognition Using Rotation Invariant Neural Network

Compensating for Large In-Plane Rotations in Natural Images

Rotation Equivariance and Invariance in Convolutional Neural Networks

Seeing eye-to-eye? A comparison of object recognition performance in humans and deep convolutional neural networks under image manipulation

RRL:Regional Rotation Layer in Convolutional Neural Networks

Tilt your Head: Activating the Hidden Spatial-Invariance of Classifiers

Rethinking Local-to-global Representation Learning for Rotation-Invariant Point Cloud Analysis

Rotation equivariant vector field networks

On Representation of 3D Rotation in the Context of Deep Learning

Isometric Transformation Invariant Graph-based Deep Neural Network

Deep Neural Networks with Efficient Guaranteed Invariances

Deep Rotation Equivariant Network

Humans and deep networks largely agree on which kinds of variation make object recognition harder

Harmonic Networks: Deep Translation and Rotation Equivariance

Rotation Invariance Neural Network

Emergence of brain-like mirror-symmetric viewpoint tuning in convolutional neural networks

Improving the Sample-Complexity of Deep Classification Networks with Invariant Integration

RIC-CNN: Rotation-Invariant Coordinate Convolutional Neural Network

A Rotation-Invariance Face Detector Based on RetinaNet