Analysing the Effects of Pooling Combinations on Invariance to Position and Deformation in Convolutional Neural Networks

Jiahuan Zhou,Weiqi Xu,Ryad Chellali
DOI: https://doi.org/10.1109/cbs.2017.8266104
2017-01-01
Abstract:Visual object recognition is an important task in advanced robotics systems for grasping or localizing purposes. Holistic solutions based on convolutional neural networks showed impressive performances in terms of recognition, however, the proposed solutions have unsatisfying performances against pose variability such as projective deformation and occlusions. In this paper, we evaluate the robustness of different ConvNet architectures and pooling methods in handling objects recognition under deformation. To simulate viewing objects from different angles and positions, we introduce the concepts of random affine transformation and constraint random affine transformation. We studied the performances of an AlexNet-based model and a VGG-based model while using different combinations of pooling methods. The result reveals that using max-pooling at the front of the network and average-pooling in the back achieves higher recognition rate, while using average-pooling at the front of the networks shows better robustness when the distortion of the input is within a limited range. Moreover, the results also show that ConvNets are more sensitive to the loss of information (occlusions) than distortions (changes in spatial distribution of the original image information)
What problem does this paper attempt to address?