Volumetric Spatial Transformer Network for Object Recognition.

Min Liu,Yifei Shi,Lintao Zheng,Yueshan Xiong,Kai Xu
DOI: https://doi.org/10.1145/3005274.3005328
2016-01-01
Abstract:Understanding 3D environments is a vital element of modern computer vision research due to paramount relevance in many vision systems, spanning a wide field of application scenarios from self-driving cars to autonomous robots [Qi et al. 2016]. At the present time, object recognition mainly employs two methods: volumetric CNNs [Wu Z 2015] and multi-view CNNs [Xu et al. 2015] [Xu et al. 2016]. In this paper, we propose a volumetric spatial transformer network for object recognition. It fills the gap between 3D CNN and 2D CNN for the first time, and provides an end-to-end training fashion. Given a 3D shape, the network can automatically select the best view that maximizes the accuracy of object recognition.
What problem does this paper attempt to address?