Abstract:Learning robust and discriminative representations is essential for 3D object retrieval. In this paper, we present an improved Multi-view Convolutional Neural Network (MVCNN) for view-based 3D object representation learning. Our technical contributions are divided into two aspects. First, we propose to employ Group-view Similarity Learning (GSL) over the multi-view representations before the aggregation operation ( i.e. , max-pooling in MVCNN). We assume that the similarity information among the view groups of different 3D objects can provide an important cue but has been neglected more or less by previous methods. To enhance it, we add a branch to the original MVCNN architecture and learn to maintain such group-view similarity relationships. Second, we utilize an end-to-end metric learning loss function to improve the representation learning process. In particular, we propose an improved Triplet-Center Loss (TCL) named Adaptive Margin based Triplet-Center Loss (AMTCL). The original TCL assumes a fixed and common margin to control the relative distance relationship between a sample to its corresponding class center and to the nearest negative center. Though TCL has demonstrated its great capacity on the 3D object retrieval task, however, when considering the distinguishability between samples of one class and samples of another class, we assume that it would be more appropriate that the margin takes different values based on the distinguishability of samples of different classes. Therefore we propose to adaptively and dynamically adjust the margin hyperparameter based on the normalized confusion matrix which is obtained on the training set during the training process. Extensive experiments on several public 3D shape benchmarks show that our method, GSL + AMTCL, can learn more suitable representations for 3D object retrieval, obtaining superior performance against state-of-the-art methods.

Multiple-view D2NNs Array: Realizing Robust 3D Object Recognition.

The mr-MDA: An Invariant to Shifting, Scaling, and Rotating Variance for 3D Object Recognition using Diffractive Deep Neural Network

3D-SSD: Learning Hierarchical Features from RGB-D Images for Amodal 3D Object Detection

Variable-Viewpoint Representations for 3D Object Recognition

Multiple Discrimination and Pairwise CNN for view-based 3D object retrieval

Multi-View Saliency Guided Deep Neural Network for 3-D Object Retrieval and Classification

Learning Disentangled Representation for Multi-View 3D Object Recognition.

Multi-View Linear Discriminant Analysis Network.

Deep Learning Multi-View Representation for Face Recognition

3D object recognition based on pairwise Multi-view Convolutional Neural Networks

Multi-view dual attention network for 3D object recognition

Double weighting convolutional neural net‐works for multi‐view 3D shape recognition

DRCNN: Dynamic Routing Convolutional Neural Network for Multi-View 3D Object Recognition

Multiscale diffractive U-Net: a robust all-optical deep learning framework modeled with sampling and skip connections.

MV-C3D: A Spatial Correlated Multi-View 3D Convolutional Neural Networks

Multi-View 3D Shape Recognition Via Correspondence-Aware Deep Learning

N2MVSNet: Non-Local Neighbors Aware Multi-View Stereo Network

Pyramid-ladder Diffractive Neural Network for Visual Recognition

Joint Multi-view 2D Convolutional Neural Networks for 3D Object Classification

An Improved Multi-View Convolutional Neural Network for 3D Object Retrieval.

Multi-View Hierarchical Fusion Network for 3D Object Retrieval and Classification