Design, Analysis and Application of A Volumetric Convolutional Neural Network

Xiaqing Pan,Yueru Chen,C.-C. Jay Kuo
DOI: https://doi.org/10.48550/arXiv.1702.00158
2017-02-01
Abstract:The design, analysis and application of a volumetric convolutional neural network (VCNN) are studied in this work. Although many CNNs have been proposed in the literature, their design is empirical. In the design of the VCNN, we propose a feed-forward K-means clustering algorithm to determine the filter number and size at each convolutional layer systematically. For the analysis of the VCNN, the cause of confusing classes in the output of the VCNN is explained by analyzing the relationship between the filter weights (also known as anchor vectors) from the last fully-connected layer to the output. Furthermore, a hierarchical clustering method followed by a random forest classification method is proposed to boost the classification performance among confusing classes. For the application of the VCNN, we examine the 3D shape classification problem and conduct experiments on a popular ModelNet40 dataset. The proposed VCNN offers the state-of-the-art performance among all volume-based CNN methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on two aspects: 1. **Narrowing the performance gap between Volume - based CNN (VCNN) and View - based CNN in 3D shape classification tasks**: Volume - based CNN directly processes 3D data and can theoretically retain more 3D structure information. However, limited by computing resources and memory requirements, its input resolution is usually lower than that of view - based methods, resulting in inferior classification performance. The paper proposes a new design method for volume - based CNN, aiming to improve the performance of volume - based CNN by optimizing network parameter selection. 2. **Solving the problem of confusing classes in 3D shape classification**: In 3D shape classification, the shapes of some classes are easily misclassified due to their similar appearances. The paper proposes a method based on Shape Anchor Vectors (SAVs) to identify these confusing classes and further improve the classification performance among these confusing classes through hierarchical clustering and random forest classifiers. ### Specific solutions - **Network parameter selection**: The paper proposes a feed - forward K - means clustering algorithm to systematically determine the number and size of filters in each convolutional layer, thereby optimizing the network structure. This method not only improves the classification performance of the network but also reduces the dependence on empirical parameter selection. - **Confusing class identification and re - classification**: - **Confusing class identification**: By analyzing the filter weights from the last fully - connected layer to the output layer (i.e., shape anchor vectors), it is identified which classes are likely to be confused. Specifically, if the angle between the shape anchor vectors of two classes is small and the internal variation of one of the classes is large, then these two classes may be easily confused. - **Re - classification**: For the identified confusing classes, the paper proposes a hierarchical clustering method to divide the samples into multiple subsets and then uses a random forest classifier to re - classify these subsets to improve the classification accuracy. ### Experimental results - **Effect of network parameter selection**: The experimental results show that by optimizing network parameter selection, the proposed VCNN has increased the average classification accuracy (ACA) and the average instance accuracy (AIA) by 2.65% and 1.94% respectively compared to VoxNet. - **Effect of re - classification of confusing classes**: After further introducing the confusing class re - classification module, the performance of VCNN has been further improved, with ACA and AIA increased by 0.57% and 0.44% respectively. In conclusion, this paper significantly improves the performance of volume - based CNN in 3D shape classification tasks by optimizing network design and solving the problem of confusing classes.