Abstract:With the continuous improvement of image processing capabilities, a three-dimensional (3D) model that can contain rich information is becoming the fourth type of multimedia data (in addition to sound, image, and video). Moreover, since there is a wide range of applications of 3D models, how to quickly and effectively obtain the correct target model from the massive data has become a key issue. To date, 3D model retrieval approaches have been proposed, and in these approaches, view-based 3D model retrieval methods can achieve satisfactory performance. In the 3D model retrieval task, the latent relationship mining of all images in a 3D model, the adaptive fusion of different images, and the discriminative feature extraction are the main challenges, but in most existing solutions, these issues are separately performed and they are not explored in an end-to-end network architecture. To solve these issues, in this work, we propose a novel and effective multi-level view associative convolution network (MLVACN) to realize view-based 3D model retrieval, where the relationship exploration of multiple-view images, the fusion of different images, and the feature discrimination learning are realized in a unified end-to-end framework. Specifically, we design the group association layer and the block association layer to study the latent relationships among different views from the view-level and the block-level, respectively. Moreover, the weight fusion layer is further designed to adaptively fuse different views in a 3D model. In addition, these three layers are embedded into the MLVACN. Finally, the pairwise discrimination loss function is proposed to learn the discriminative features of the 3D model. Extensive experimental results on three 3D model retrieval datasets including ModelNet40, ModelNet10, and ShapeNetCore55 demonstrate that MLVACN can outperform state-of-the-art methods in term of mAP. When the ModelNet40 dataset is used, th- mAP of MLVACN is improved by 13.25%, 7.75%, 3.95%, and 0.61% as compared to those of the MVCNN, GVCNN, PVNet, and MLVCNN methods, respectively.

SCA-PVNet: Self-and-cross attention based aggregation of point cloud and multi-view for 3D object retrieval

SCA-PVNet: Self-and-Cross Attention Based Aggregation of Point Cloud and Multi-View for 3D Object Retrieval

SCA-Net: Spatial and channel attention-based network for 3D point clouds

PV-SSD: A Multi-Modal Point Cloud Feature Fusion Method for Projection Features and Variable Receptive Field Voxel Features

Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval

Multi-View 3d Object Retrieval with Deep Embedding Network

FA-MSVNet: multi-scale and multi-view feature aggregation methods for stereo 3D reconstruction

SCSA: Exploring the Synergistic Effects Between Spatial and Channel Attention

PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition.

MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding

PVConvNet: Pixel-Voxel Sparse Convolution for multimodal 3D object detection

Multi-Level View Associative Convolution Network for View-Based 3D Model Retrieval

PointMM: Point Cloud Semantic Segmentation CNN under Multi-Spatial Feature Encoding and Multi-Head Attention Pooling

Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation

Multiple Discrimination and Pairwise CNN for view-based 3D object retrieval

PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection

Point Attention Network for Point Cloud Semantic Segmentation.

MixedSCNet: LiDAR-Based Place Recognition Using Multi-Channel Scan Context Neural Network

SCFANet: Semantics and Context Feature Aggregation Network for 360° Salient Object Detection

CAF-RCNN: multimodal 3D object detection with cross-attention

PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection