Abstract:Abstract The existing view-based 3D object classification and recognition methods ignore the inherent hierarchical correlation and distinguishability of views, making it difficult to further improve the classification accuracy. In order to solve this problem, this paper proposes an end-to-end multi-view dual attention network framework for high-precision recognition of 3D objects. On one hand, we obtain three feature layers of query, key, and value through the convolution layer. The spatial attention matrix is generated by the key-value pairs of query and key, and each feature in the value of the original feature space branch is assigned different importance, which clearly captures the prominent detail features in the view, generates the view space shape descriptor, and focuses on the detail part of the view with the feature of category discrimination. On the other hand, a channel attention vector is obtained by compressing the channel information in different views, and the attention weight of each view feature is scaled to find the correlation between the target views and focus on the view with important features in all views. Integrating the two feature descriptors together to generate global shape descriptors of the 3D model, which has a stronger response to the distinguishing features of the object model and can be used for high-precision 3D object recognition. The proposed method achieves an overall accuracy of 96.6% and an average accuracy of 95.5% on the open-source ModelNet40 dataset, compiled by Princeton University when using Resnet50 as the basic CNN model. Compared with the existing deep learning methods, the experimental results demonstrate that the proposed method achieves state-of-the-art performance in the 3D object classification accuracy.

ReINView: Re-interpreting Views for Multi-view 3D Object Recognition

View-relation Constrained Global Representation Learning for Multi-View-based 3D Object Recognition

Variable-Viewpoint Representations for 3D Object Recognition

Learning Disentangled Representation for Multi-View 3D Object Recognition.

OVPT: Optimal Viewset Pooling Transformer for 3D Object Recognition.

Multi-view Moments Embedding Network for 3D Shape Recognition

Multi-View Stereo Representation Revist: Region-Aware MVSNet

Multi-view dual attention network for 3D object recognition

Viewpoint Alignment and Discriminative Parts Enhancement in 3D Space for Vehicle ReID

Unsupervised Multi-View CNN for Salient View Selection and 3D Interest Point Detection

Learning Relationships For Multi-View 3d Object Recognition

Learning Canonical View Representation for 3D Shape Recognition with Arbitrary Views

Deep Models for Multi-View 3D Object Recognition: A Review

Learning the Global Descriptor for 3-D Object Recognition Based on Multiple Views Decomposition

ViewFormer: View Set Attention for Multi-view 3D Shape Understanding

View-based weight network for 3D object recognition

Generalizable Person Re-Identification via Viewpoint Alignment and Fusion

Deep Learning Multi-View Representation for Face Recognition

3M3D: Multi-view, Multi-path, Multi-representation for 3D Object Detection

3D Reconstruction for Multi-view Objects

RC-MVSNet: Unsupervised Multi-View Stereo with Neural Rendering