Abstract:The current methods for multi-view-based 3D object recognition have the problem of losing the correlation between views and rendering 3D objects with multi-view redundancy. This makes it difficult to improve recognition performance and unnecessarily increases the computational cost and running time of the network. Especially in the case of limited computing resources, the recognition performance is further affected. Our study developed an optimal viewset pooling transformer (OVPT) method for efficient and accurate 3D object recognition. The OVPT method constructs the optimal viewset based on information entropy to reduce the redundancy of the multi-view scheme. We used convolutional neural network (CNN) to extract the multi-view low-level local features of the optimal viewset. Embedding class token into the headers of multi-view low-level local features and splicing with position encoding generates local-view token sequences. This sequence was trained parallel with a pooling transformer to generate a local view information token sequence. At the same time, the global class token captured the global feature information of the local view token sequence. The two were aggregated next into a single compact 3D global feature descriptor. On two public benchmarks, ModelNet10 and ModelNet40, for each 3D object we only need a smaller number of optimal viewsets, achieving an overall recognition accuracy (OA) of 99.33% and 97.48%, respectively. Compared with other deep learning methods, our method still achieves state-of-the-art performance with limited computational resources. Our source code is available at https://github.com/shepherds001/OVPT.

Learning Descriptors with Cube Loss for View-Based 3-D Object Retrieval

Feature Representation for 3D Object Retrieval Based on Unconstrained Multi-View

Recurrent Volume-Based 3-D Feature Fusion for Real-Time Multiview Object Pose Estimation.

Discriminatively Learning for Representing Local Image Features with Quadruplet Model

Viewpoint-Aware Representation for Sketch-Based 3D Model Retrieval

Multi-View 3d Object Retrieval with Deep Embedding Network

Learning Local Feature Descriptors with Quadruplet Ranking Loss

Recurrent Volume-based 3D Feature Fusion for Real-time Multi-view Object Pose Estimation

Multiple Discrimination and Pairwise CNN for view-based 3D object retrieval

Exploring Discriminative Views for 3D Object Retrieval.

View-based 3D object retrieval with discriminative views.

Group-Pair Convolutional Neural Networks for Multi-View Based 3D Object Retrieval.

A Unified Feature Representation and Learning Framework for 3D Shape

A Dimensional Reduction Guiding Deep Learning Architecture for 3D Shape Retrieval

Learning Discriminative and Generative Shape Embeddings for Three-Dimensional Shape Retrieval

Group-pair deep feature learning for multi-view 3d model retrieval

Learning the Global Descriptor for 3-D Object Recognition Based on Multiple Views Decomposition

View-Based Discriminative Probabilistic Modeling for 3D Object Retrieval and Recognition

3-D Object Retrieval With Hausdorff Distance Learning

OVPT: Optimal Viewset Pooling Transformer for 3D Object Recognition.

Efficient View-Based 3-D Object Retrieval Via Hypergraph Learning