Generative and Discriminative Voxel Modeling with Convolutional Neural Networks

Andrew Brock,Theodore Lim,J.M. Ritchie,Nick Weston
DOI: https://doi.org/10.48550/arXiv.1608.04236
2016-08-16
Abstract:When working with three-dimensional data, choice of representation is key. We explore voxel-based models, and present evidence for the viability of voxellated representations in applications including shape modeling and object classification. Our key contributions are methods for training voxel-based variational autoencoders, a user interface for exploring the latent space learned by the autoencoder, and a deep convolutional neural network architecture for object classification. We address challenges unique to voxel-based representations, and empirically evaluate our models on the ModelNet benchmark, where we demonstrate a 51.5% relative improvement in the state of the art for object classification.
Computer Vision and Pattern Recognition,Human-Computer Interaction,Machine Learning
What problem does this paper attempt to address?
This paper aims to solve several key problems in 3D data processing, especially those related to shape modeling and object classification. Specifically, the paper explores the application of voxel representation in these tasks and proposes several new methods to overcome the challenges brought by voxel representation, such as high computational cost and the curse of dimensionality that limits the available resolution. ### Main problems to be solved include: 1. **Shape Modeling and Object Classification**: - The paper proposes a voxel - based variational auto - encoder (VAE) method for generating high - quality 3D shape interpolation and random sample generation. - At the same time, the paper also proposes a deep convolutional neural network (ConvNet) architecture for 3D object classification tasks. 2. **Challenges of Voxel Representation**: - Although voxel representation is suitable for convolutional neural networks, it has problems of high computational cost and the curse of dimensionality. The paper addresses these challenges by designing efficient network architectures and training methods. - In particular, the paper proposes an improved binary cross - entropy (BCE) loss function to improve the vanishing gradient problem during the training process and balance the weights of positive and negative samples. 3. **Performance Improvement**: - The paper conducts experiments on the ModelNet benchmark dataset and shows a significant performance improvement of its method in object classification tasks. Specifically, compared with existing methods, the paper's method achieves a relative performance improvement of 51.5% and 53.2% on the ModelNet40 and ModelNet10 datasets respectively. ### Specific Contributions: 1. **Generative Model**: - Proposes a method for training voxel - based variational auto - encoders, which can perform high - quality shape interpolation and random sample generation. - Designs a user interface that allows users to explore the latent space learned by the auto - encoder. 2. **Discriminative Model**: - Proposes a deep convolutional neural network architecture for 3D object classification tasks. - Introduces Voxception and Voxception - ResNet modules, which combine the advantages of Inception and ResNet, improving the expressive ability and classification performance of the model. 3. **Experimental Verification**: - Conducts extensive experiments on the ModelNet benchmark dataset to verify the effectiveness and superiority of the proposed method. Through these methods and contributions, the paper successfully demonstrates the feasibility and potential of voxel representation in 3D shape modeling and object classification tasks.