Abstract:We introduce a method to classify imagery using a convo- lutional neural network (CNN) on multi-view image pro- jections. The power of our method comes from using pro- jections of multiple images at multiple depth planes near the reconstructed surface. This enables classification of categories whose salient aspect is appearance change un- der different viewpoints, such as water, trees, and other materials with complex reflection/light response proper- ties. Our method does not require boundary labelling in images and works on pixel-level classification with a small (few pixels) context, which simplifies the cre- ation of a training set. We demonstrate this application on large-scale aerial imagery collections, and extend the per-pixel classification to robustly create a consistent 2D classification which can be used to fill the gaps in non- reconstructible water regions. We also apply our method to classify tree regions. In both cases, the training data can quickly be generated using a small number of manually- created polygons on a map. We show that even with a very simple and standard network our CNN outperforms the state-of-the-art image classification, the Inception-V3 model retrained from a large collection of aerial images.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use multi - view image projection to improve the accuracy of classification in large - scale 3D scene classification, especially when dealing with categories such as water bodies and trees that have large appearance changes under different viewing angles. Specifically, the paper proposes a method based on convolutional neural network (CNN) to achieve this goal by using the projections of multiple images on multiple depth planes. This method can effectively deal with the following challenges: 1. **Scale of large - scale classification systems**: The system needs to have good generalization ability to avoid significantly increasing the size of the training set as the scope of use expands. 2. **Creation and management of training sets**: In order to simplify the creation of training sets, the system does not require a large amount of manually labeled data, but only a small amount of sparse pixel labels. 3. **Robustness to reconstruction and light consistency**: The system needs to be able to continue working when the underlying stereo pipeline changes. 4. **High precision**: Even a small error rate may lead to a large number of visible artifacts, so the system needs to achieve very high classification accuracy. The paper pays special attention to two application scenarios: - **Water body classification**: Water bodies are often difficult to handle in stereo reconstruction because they are often moving and have specular highlights, which are prone to produce misleading light - consistency maxima. The method proposed in the paper can identify and fill these errors, thereby generating more accurate water body classification results. - **Tree classification**: Trees are a common object that is very useful in many applications, from visualization to modeling to geographic information systems (GIS). The method in the paper overcomes the difficulties of single - image classification through the interaction between multi - view image features. Through these methods, the paper demonstrates its application on large - scale aerial image collections and extends per - pixel classification to create a consistent 2D classification for filling water body holes in non - reconstructed areas. In addition, the paper also experimentally verifies that its method is superior to existing image - based classification methods, such as the Inception - V3 model, in the water body classification task.

Large-Scale 3D Scene Classification With Multi-View Volumetric CNN

Resource-Constrained Simultaneous Detection and Labeling of Objects in High-Resolution Satellite Images

Unsupervised Multi-View CNN for Salient View Selection and 3D Interest Point Detection

Multi-level 3D CNN for Learning Multi-scale Spatial Features

MSDCNN: A multiscale dilated convolution neural network for fine-grained 3D shape classification

Joint Multi-view 2D Convolutional Neural Networks for 3D Object Classification

Classification of Very-High-Spatial-Resolution Aerial Images Based on Multiscale Features with Limited Semantic Information

Learning Multiviewpoint Context-Aware Representation for RGB-D Scene Classification

Large Kernel Separable Mixed ConvNet for Remote Sensing Scene Classification

A very high-resolution scene classification model using transfer deep CNNs based on saliency features

Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution Cnns

Scene Classification Using Multi-Scale Deeply Described Visual Words

Multi-Center Brain Imaging Classification Using a Novel 3D CNN Approach.

A Multi-Modal, Discriminative and Spatially Invariant CNN for RGB-D Object Labeling

A Spatial Mapping Algorithm with Applications in Deep Learning-Based Structure Classification

Self-supervised novel 2D view synthesis of large-scale scenes with efficient multi-scale voxel carving

Large-scale point cloud semantic segmentation via local perception and global descriptor vector

A Multispectral and Multiangle 3-D Convolutional Neural Network for the Classification of ZY-3 Satellite Images Over Urban Areas

Multi-Scale and Multi-Network Deep Feature Fusion for Discriminative Scene Classification of High-Resolution Remote Sensing Images

3DVNet: Multi-View Depth Prediction and Volumetric Refinement

Scene Classification Of High Resolution Remote Sensing Images Using Convolutional Neural Networks