Hierarchical Spatial Transformer Network

Chang Shu,Xi Chen,Qiwei Xie,Hua Han

DOI: https://doi.org/10.48550/arXiv.1801.09467

2018-01-30

Abstract:Computer vision researchers have been expecting that neural networks have spatial transformation ability to eliminate the interference caused by geometric distortion for a long time. Emergence of spatial transformer network makes dream come true. Spatial transformer network and its variants can handle global displacement well, but lack the ability to deal with local spatial variance. Hence how to achieve a better manner of deformation in the neural network has become a pressing matter of the moment. To address this issue, we analyze the advantages and disadvantages of approximation theory and optical flow theory, then we combine them to propose a novel way to achieve image deformation and implement it with a hierarchical convolutional neural network. This new approach solves for a linear deformation along with an optical flow field to model image deformation. In the experiments of cluttered MNIST handwritten digits classification and image plane alignment, our method outperforms baseline methods by a large margin.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper primarily addresses the issue of how neural networks can better handle spatial deformations when processing images. Specifically: 1. **Background and Challenges**: - Deep learning has achieved significant success in the field of computer vision, but it still faces challenges in dealing with the effects of geometric distortions. - Existing methods (such as handcrafted features and data augmentation) do not fundamentally solve the problem. 2. **Limitations of Existing Solutions**: - Spatial Transformer Networks (STN) and their variants can handle global displacements well but lack the ability to manage local spatial changes. - Inverse Compositional Spatial Transformer Networks (IC-STN) have made some improvements but are still limited to linear transformations, unable to adequately address complex nonlinear deformations. 3. **Proposed New Method**: - The paper proposes a Hierarchical Spatial Transformer Network (HSTN), which decomposes image deformation into linear and nonlinear parts. - HSTN uses two modules: a linear transformation generator and an optical flow field generator, which estimate linear and nonlinear deformation parameters, respectively. - In experiments, HSTN significantly outperformed baseline methods in tasks such as handwritten digit classification (cluttered MNIST) and image plane alignment. Through this approach, HSTN can better handle large-scale displacements and local detail changes, thereby enhancing the spatial invariance of neural networks.

Hierarchical Spatial Transformer Network

Spatial Transformer for 3D Point Clouds

Research On Spatial Transformation In Image Based On Deep Learning

A Hierarchical Spatial Transformer for Massive Point Samples in Continuous Space

Spatial Transformer Introspective Neural Network.

CTFCD: Channel Transformer Based on Full Convolutional Decoder for Single Image Deraining

An Efficient Dehazing Algorithm Based on the Fusion of Transformer and Convolutional Neural Network.

From Plane to Hierarchy: Deformable Transformer for Remote Sensing Image Captioning

Volumetric Spatial Transformer Network for Object Recognition.

Deep Image Spatial Transformation for Person Image Generation

SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding

Hierarchical local global transformer for point clouds analysis

Learning Hierarchical Visual Transformation for Domain Generalizable Visual Matching and Recognition

HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer

An efficient multi‐scale transformer for satellite image dehazing

Deep Spatial Transformation for Pose-Guided Person Image Generation and Animation.

Spatio-Temporal Transformer Network for Weather Forecasting

Probabilistic Spatial Transformer Networks

Spatial Transformer Point Convolution

Guided Spatial Transformers for Facial Expression Recognition

Learning Spatially Structured Image Transformations Using Planar Neural Networks