Abstract:Abstract The existing view-based 3D object classification and recognition methods ignore the inherent hierarchical correlation and distinguishability of views, making it difficult to further improve the classification accuracy. In order to solve this problem, this paper proposes an end-to-end multi-view dual attention network framework for high-precision recognition of 3D objects. On one hand, we obtain three feature layers of query, key, and value through the convolution layer. The spatial attention matrix is generated by the key-value pairs of query and key, and each feature in the value of the original feature space branch is assigned different importance, which clearly captures the prominent detail features in the view, generates the view space shape descriptor, and focuses on the detail part of the view with the feature of category discrimination. On the other hand, a channel attention vector is obtained by compressing the channel information in different views, and the attention weight of each view feature is scaled to find the correlation between the target views and focus on the view with important features in all views. Integrating the two feature descriptors together to generate global shape descriptors of the 3D model, which has a stronger response to the distinguishing features of the object model and can be used for high-precision 3D object recognition. The proposed method achieves an overall accuracy of 96.6% and an average accuracy of 95.5% on the open-source ModelNet40 dataset, compiled by Princeton University when using Resnet50 as the basic CNN model. Compared with the existing deep learning methods, the experimental results demonstrate that the proposed method achieves state-of-the-art performance in the 3D object classification accuracy.

Multi-View 3d Object Retrieval with Deep Embedding Network

Feature Representation for 3D Object Retrieval Based on Unconstrained Multi-View

Group-pair deep feature learning for multi-view 3d model retrieval

Multiple Discrimination and Pairwise CNN for view-based 3D object retrieval

Group-Pair Convolutional Neural Networks for Multi-View Based 3D Object Retrieval.

DeepCCFV: Camera Constraint-Free Multi-View Convolutional Neural Network for 3D Object Retrieval

Off-the-shelf CNN features for 3D object retrieval

OVPT: Optimal Viewset Pooling Transformer for 3D Object Recognition.

Learning Disentangled Representation for Multi-View 3D Object Recognition.

View-based 3D Object Retrieval Via Multi-Modal Graph Learning

Exploring Discriminative Views for 3D Object Retrieval.

View-based 3D object retrieval with discriminative views.

Multi-view Moments Embedding Network for 3D Shape Recognition

Multi-view dual attention network for 3D object recognition

Learning Descriptors with Cube Loss for View-Based 3-D Object Retrieval

Learning Discriminative and Generative Shape Embeddings for Three-Dimensional Shape Retrieval

Multi-view SoftPool Attention Convolutional Networks for 3D Model Classification.

A Unified Feature Representation and Learning Framework for 3D Shape

Learning Feature Embedding with Strong Neural Activations for Fine-Grained Retrieval

Learning the Global Descriptor for 3-D Object Recognition Based on Multiple Views Decomposition

Joint Multi-view 2D Convolutional Neural Networks for 3D Object Classification