Abstract:Learning robust and discriminative representations is essential for 3D object retrieval. In this paper, we present an improved Multi-view Convolutional Neural Network (MVCNN) for view-based 3D object representation learning. Our technical contributions are divided into two aspects. First, we propose to employ Group-view Similarity Learning (GSL) over the multi-view representations before the aggregation operation ( i.e. , max-pooling in MVCNN). We assume that the similarity information among the view groups of different 3D objects can provide an important cue but has been neglected more or less by previous methods. To enhance it, we add a branch to the original MVCNN architecture and learn to maintain such group-view similarity relationships. Second, we utilize an end-to-end metric learning loss function to improve the representation learning process. In particular, we propose an improved Triplet-Center Loss (TCL) named Adaptive Margin based Triplet-Center Loss (AMTCL). The original TCL assumes a fixed and common margin to control the relative distance relationship between a sample to its corresponding class center and to the nearest negative center. Though TCL has demonstrated its great capacity on the 3D object retrieval task, however, when considering the distinguishability between samples of one class and samples of another class, we assume that it would be more appropriate that the margin takes different values based on the distinguishability of samples of different classes. Therefore we propose to adaptively and dynamically adjust the margin hyperparameter based on the normalized confusion matrix which is obtained on the training set during the training process. Extensive experiments on several public 3D shape benchmarks show that our method, GSL + AMTCL, can learn more suitable representations for 3D object retrieval, obtaining superior performance against state-of-the-art methods.

Object-Based Video Multi-Label Classification with an Improved 3D Convolutional Neural Network

3D Object Classification Based on Multi Convolutional Neural Networks

Joint Multi-view 2D Convolutional Neural Networks for 3D Object Classification

Multi-view SoftPool Attention Convolutional Networks for 3D Model Classification.

Spatial Context-Aware Object-Attentional Network for Multi-Label Image Classification

Exploit Bounding Box Annotations for Multi-Label Object Recognition

Multi-view dual attention network for 3D object recognition

Multi-label video classification via coupling attentional multiple instance learning with label relation graph

A Multi-Modal, Discriminative and Spatially Invariant CNN for RGB-D Object Labeling

Video Object Segmentation with 3D Convolution Network

Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification.

MV-C3D: A Spatial Correlated Multi-View 3D Convolutional Neural Networks

Multi-Head Self-Attention for 3D Point Cloud Classification

3D object recognition based on pairwise Multi-view Convolutional Neural Networks

An Improved Multi-View Convolutional Neural Network for 3D Object Retrieval.

An Improved Convolutional Neural Network Algorithm and Its Application in Multilabel Image Labeling

Multi-Label Classification with Label Graph Superimposing

Voxel-based three-view hybrid parallel network for 3D object classification

Exploiting Temporal Information for DCNN-Based Fine-Grained Object Classification

Video object segmentation by Multi-Scale Pyramidal Multi-Dimensional LSTM with generated depth context

Fusing Multi-Stream Deep Networks for Video Classification