Deep Graph Topology Learning for 3D Point Cloud Reconstruction /Author=Duan, Chaojing; Chen, Siheng; Tian, Dong; Moura, Jose; Kovacevic, Jelena /CreationDate=June 29, 2019 /Subject=Artificial Intelligence, Computer Vision, Signal Processing
Chaojing Chen,Siheng Tian,Dong Moura,Jose Kovacevic,Chaojing Duan,Siheng Chen,Dong Tian,José M. F. Moura,Jelena Kovačević
2020-01-01
Abstract:We propose an autoencoder with graph topology learning to learn compact representations of 3D point clouds in an unsupervised manner. As discrete representations of continuous surfaces, 3D point clouds are either directly acquired via 3D scanners like Lidar sensors, or generated from multi-view images or RGB-D data. Different from 1D speech data or 2D images, which are associated with regular lattices, 3D point clouds are usually sparsely and irregularly scattered in the 3D space; this makes traditional latticed-based algorithms difficult to handle 3D point clouds. Most previous works discretize 3D point clouds by transforming them to either 3D voxels or multi-view images, causing volume redundancies and the quantization artifacts. As a pioneering work, PointNet is a deep-neural-network based method that uses pointwise multi-layer perceptron followed by maximum pooling to handle raw 3D points and achieve remarkable performances in many supervised tasks, including classification, segmentation and semantic segmentation of 3D point clouds. Graph Signal Processing Workshop (GSP) This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Copyright c © Mitsubishi Electric Research Laboratories, Inc., 2019 201 Broadway, Cambridge, Massachusetts 02139 Deep Graph Topology Learning for 3D Point Cloud Reconstruction Chaojing Duan, Siheng Chen, Dong Tian, José M. F. Moura, Jelena Kovačević 1 Carnegie Mellon University, 2 Mitsubishi Electric Research Laboratories (MERL), 3 InterDigital, 4 New York University We propose an autoencoder with graph topology learning to learn compact representations of 3D point clouds in an unsupervised manner. As discrete representations of continuous surfaces, 3D point clouds are either directly acquired via 3D scanners like Lidar sensors [5], or generated from multi-view images or RGB-D data [4]. Different from 1D speech data or 2D images, which are associated with regular lattices [3], 3D point clouds are usually sparsely and irregularly scattered in the 3D space; this makes traditional latticed-based algorithms difficult to handle 3D point clouds. Most previous works discretize 3D point clouds by transforming them to either 3D voxels or multi-view images, causing volume redundancies and the quantization artifacts. As a pioneering work, PointNet is a deep-neuralnetwork based method that uses pointwise multi-layer perceptron followed by maximum pooling to handle raw 3D points and achieve remarkable performances in many supervised tasks, including classification, segmentation and semantic segmentation of 3D point clouds [6]. In this work, we consider unsupervised learning of 3D point clouds; that is, learning compact representations of 3D point clouds via self-organization. Several recent works have been proposed to pursue a similar goal, such as LatentGAN [1], 3DGAN [7], and FoldingNet [8]. They adopt an encoder-decoder framework. The encoder follows similar architectures in PointNet and extracts global features; and the decoder is used to reconstruct 3D point clouds based on global features produced by the encoder. To design a decoder, LatentGAN [1] uses fully-connected layers, which does not explore the geometric properties of 3D point clouds at all; FoldingNet [8] uses point-wise multi-layer perceptrons to fold a 2D lattice to a 3D surface, which assumes that the underlying surface of all points has genus less than 2. Table 1 shows that FoldingNet can hardly reconstruct torus with high-order genus. Moreover, since features obtained from the encoder provide global information, previous works are hard to capture detailed local geometric structures.