MSDCNN: A multiscale dilated convolution neural network for fine-grained 3D shape classification

Wei Zhou,Fujian Zheng,Yiheng Zhao,Yiran Pang,Jun Yi
DOI: https://doi.org/10.1016/j.neunet.2024.106141
IF: 7.8
2024-01-24
Neural Networks
Abstract:Multi-view deep neural networks have shown excellent performance on 3D shape classification tasks. However, global features aggregated from multiple views data often lack content information and spatial relationship, which leads to difficult identification the small variance among subcategories in the same category. To solve this problem, in this paper, a novel multiscale dilated convolution neural network termed as MSDCNN is proposed for multi-view fine-grained 3D shape classification. Firstly, a sequence of views are rendered from 12-viewpoints around the input 3D shape by the sequential view capturing module. Then, the first 22 convolution layers of ResNeXt50 is employed to extract the semantic features of each view, and a global mixed feature map is obtained through the element-wise maximum operation of the 12 output feature maps. Furthermore, attention dilated module (ADM), which combines four concatenated attention dilated block (ADB), is designed to extract larger receptive field features from global mixed feature map to enhance context information among the views. Specifically, each ADB is consisted by an attention mechanism module and a dilated convolution with different dilation rates. In addition, prediction module with label smoothing is proposed to classify features, which contains 3 × 3 convolution and adaptive average pooling. The performance of our method is validated experimentally on the ModelNet10, ModelNet40 and FG3D datasets. Experimental results demonstrate the effectiveness and superiority of the proposed MSDCNN framework for 3D shape fine-grained classification.
computer science, artificial intelligence,neurosciences
What problem does this paper attempt to address?