Hierarchical Spatial Transformer Network

Chang Shu,Xi Chen,Qiwei Xie,Hua Han
DOI: https://doi.org/10.48550/arXiv.1801.09467
2018-01-30
Abstract:Computer vision researchers have been expecting that neural networks have spatial transformation ability to eliminate the interference caused by geometric distortion for a long time. Emergence of spatial transformer network makes dream come true. Spatial transformer network and its variants can handle global displacement well, but lack the ability to deal with local spatial variance. Hence how to achieve a better manner of deformation in the neural network has become a pressing matter of the moment. To address this issue, we analyze the advantages and disadvantages of approximation theory and optical flow theory, then we combine them to propose a novel way to achieve image deformation and implement it with a hierarchical convolutional neural network. This new approach solves for a linear deformation along with an optical flow field to model image deformation. In the experiments of cluttered MNIST handwritten digits classification and image plane alignment, our method outperforms baseline methods by a large margin.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper primarily addresses the issue of how neural networks can better handle spatial deformations when processing images. Specifically: 1. **Background and Challenges**: - Deep learning has achieved significant success in the field of computer vision, but it still faces challenges in dealing with the effects of geometric distortions. - Existing methods (such as handcrafted features and data augmentation) do not fundamentally solve the problem. 2. **Limitations of Existing Solutions**: - Spatial Transformer Networks (STN) and their variants can handle global displacements well but lack the ability to manage local spatial changes. - Inverse Compositional Spatial Transformer Networks (IC-STN) have made some improvements but are still limited to linear transformations, unable to adequately address complex nonlinear deformations. 3. **Proposed New Method**: - The paper proposes a Hierarchical Spatial Transformer Network (HSTN), which decomposes image deformation into linear and nonlinear parts. - HSTN uses two modules: a linear transformation generator and an optical flow field generator, which estimate linear and nonlinear deformation parameters, respectively. - In experiments, HSTN significantly outperformed baseline methods in tasks such as handwritten digit classification (cluttered MNIST) and image plane alignment. Through this approach, HSTN can better handle large-scale displacements and local detail changes, thereby enhancing the spatial invariance of neural networks.