Abstract:Effectively recognizing different sceneries with complex backgrounds and varied lighting conditions plays an important role in modern AI systems. Competitive performance has recently been achieved by the deep scene categorization models. However, these models implicitly hypothesize that the image-level labels are 100% correct, which is too restrictive. Practically, the image-level labels for massive-scale scenery sets are usually calculated by external predictors such as ImageNet-CN. These labels can easily become contaminated because no predictors are completely accurate. This article proposes a new deep architecture that calculates scene categories by hierarchically deriving stable templates, which are discovered using a generative model. Specifically, we first construct a semantic space by incorporating image-level labels using subspace embedding. Afterward, it is noticeable that in the semantic space, the superpixel distributions from identically labeled images remain unchanged, regardless of the image-level label noises. On the basis of this observation, a probabilistic generative model learns the stable templates for each scene category. To deeply represent each scenery category, a novel aggregation network is developed to statistically concatenate the CNN features learned from scene annotations predicted by HSA. Finally, the learned deep representations are integrated into an image kernel, which is subsequently incorporated into a multiclass SVM for distinguishing scene categories. Thorough experiments have shown the performance of our method. As a byproduct, an empirical study of 33 SIFT-flow categories shows that the learned stable templates remain almost unchanged under a nearly 36% image label contamination rate.

Learning Representative and Discriminative Image Representation by Deep Appearance and Spatial Coding.

Spatial Locality-Preserving Feature Coding for Image Classification

Learning a Discriminative Dictionary with CNN for Image Classification.

Perceptual Visual Feature Learning With Applications in Sports Educational Image Understanding

Learnable receptive fields scheme in deep networks for image categorization

Hierarchical Deep Feature Representation For High-Resolution Scene Classification

End-to-End Trained Sparse Coding Network with Spatial Pyramid Pooling for Image Classification

Image Classification by Non-Negative Sparse Coding, Low-Rank and Sparse Decomposition

Learning Region-Wise Deep Feature Representation for Image Analysis

Towards Learning Spatially Discriminative Feature Representations

Exemplar Based Deep Discriminative and Shareable Feature Learning for Scene Image Classification

Classification and Representation Joint Learning Via Deep Networks.

Feature Encoding with Hybrid Heterogeneous Structure Model for Image Classification.

Learning Class-Specific Pooling Shapes for Image Classification.

Learning Important Spatial Pooling Regions for Scene Classification

Bag of Spatial Visual Words Model for Scene Classification

Deeply Encoding Stable Patterns From Contaminated Data for Scenery Image Recognition

Learning Discriminative Visual Dictionary for Natural Scene Categorization

Remote Sensing and Texture Image Classification Network Based on Deep Learning Integrated with Binary Coding and Sinkhorn Distance

Image Classification by Non-Negative Sparse Coding, Correlation Constrained Low-Rank and Sparse Decomposition.

Spatial Encoding of Visual Words for Image Classification.