Abstract:Learning robust and discriminative representations is essential for 3D object retrieval. In this paper, we present an improved Multi-view Convolutional Neural Network (MVCNN) for view-based 3D object representation learning. Our technical contributions are divided into two aspects. First, we propose to employ Group-view Similarity Learning (GSL) over the multi-view representations before the aggregation operation ( i.e. , max-pooling in MVCNN). We assume that the similarity information among the view groups of different 3D objects can provide an important cue but has been neglected more or less by previous methods. To enhance it, we add a branch to the original MVCNN architecture and learn to maintain such group-view similarity relationships. Second, we utilize an end-to-end metric learning loss function to improve the representation learning process. In particular, we propose an improved Triplet-Center Loss (TCL) named Adaptive Margin based Triplet-Center Loss (AMTCL). The original TCL assumes a fixed and common margin to control the relative distance relationship between a sample to its corresponding class center and to the nearest negative center. Though TCL has demonstrated its great capacity on the 3D object retrieval task, however, when considering the distinguishability between samples of one class and samples of another class, we assume that it would be more appropriate that the margin takes different values based on the distinguishability of samples of different classes. Therefore we propose to adaptively and dynamically adjust the margin hyperparameter based on the normalized confusion matrix which is obtained on the training set during the training process. Extensive experiments on several public 3D shape benchmarks show that our method, GSL + AMTCL, can learn more suitable representations for 3D object retrieval, obtaining superior performance against state-of-the-art methods.

MVContrast:Unsupervised Pretraining for Multi-view 3D Object Recognition

MVContrast: Unsupervised Pretraining for Multi-view 3D Object Recognition

A contrastive learning based unsupervised multi-view stereo with multi-stage self-training strategy

Contrastive Multi-View Learning for 3D Shape Clustering

ProposalContrast: Unsupervised Pre-training for LiDAR-based 3D Object Detection

Align Yourself: Self-supervised Pre-training for Fine-grained Recognition via Saliency Alignment.

Unsupervised Multi-View CNN for Salient View Selection and 3D Interest Point Detection

OVPT: Optimal Viewset Pooling Transformer for 3D Object Recognition.

Learning Disentangled Representation for Multi-View 3D Object Recognition.

Multi-view dual attention network for 3D object recognition

Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding

Multi-view Convolutional Vision Transformer for 3D Object Recognition.

Self-Supervised Multi-View Learning via Auto-Encoding 3D Transformations

CL-MVSNet: Unsupervised Multi-View Stereo with Dual-Level Contrastive Learning

Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception

Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning

Multi-View Representation is What You Need for Point-Cloud Pre-Training

MV-C3D: A Spatial Correlated Multi-View 3D Convolutional Neural Networks

An Improved Multi-View Convolutional Neural Network for 3D Object Retrieval.

Joint Multi-view 2D Convolutional Neural Networks for 3D Object Classification

3D object recognition based on pairwise Multi-view Convolutional Neural Networks