Abstract:<p class="a-plus-plus">Recently, deep CNN-based methods have achieved significant success in solving various 2D computer vision issues. However, directly processing 3D point clouds with CNNs remains a challenging problem due to their irregular characteristic, which results in the comprehensive performance far from optimal. In this paper, we propose a novel trainable architecture for 3D point cloud based object recognition from the perspective of depth of network and attention mechanism for the first time. We first transform the input point cloud into regular volumetric representation using binary occupancy grid strategy. The output is then fed into our proposed 3D Dense-Attention CNN framework, dubbed as <span class="a-plus-plus inline-equation id-i-eq1"><span class="a-plus-plus equation-source format-t-e-x"><span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="13.57ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 5842.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMAINB-33" x="0" y="0"></use> <use xlink:href="#MJMAINB-44" x="575" y="0"></use> <use xlink:href="#MJMAINB-44" x="1458" y="0"></use> <use xlink:href="#MJMAINB-41" x="2340" y="0"></use> <use xlink:href="#MJMAINB-43" x="3210" y="0"></use> <use xlink:href="#MJMAINB-4E" x="4041" y="0"></use> <use xlink:href="#MJMAINB-4E" x="4942" y="0"></use></g></svg></span></span></span>, to obtain features with enhanced representation power. Extensive experiments on highly challenging datasets demonstrate the effectiveness of our proposed model, which can achieve remarkable performance.</p><svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMAINB-33" d="M80 503Q80 565 133 610T274 655Q366 655 421 623T491 538Q493 528 493 510Q493 446 453 407T361 348L376 344Q452 324 489 281T526 184Q526 152 514 121T474 58T392 8T265 -11Q175 -11 111 34T48 152Q50 187 72 209T132 232Q171 232 193 208T216 147Q216 136 214 126T207 108T197 94T187 84T178 77T170 72L168 71Q168 70 179 65T215 54T266 48H270Q331 48 350 105Q358 128 358 185Q358 239 348 268T309 313Q292 321 242 322Q205 322 198 324T191 341V348Q191 366 196 369T232 375Q239 375 247 376T260 377T268 378Q284 383 297 393T326 436T341 517Q341 536 339 547T331 573T308 593T266 600Q248 600 241 599Q214 593 183 576Q234 556 234 503Q234 462 210 444T157 426Q126 426 103 446T80 503Z"></path><path stroke-width="1" id="MJMAINB-44" d="M39 624V686H270H310H408Q500 686 545 680T638 649Q768 584 805 438Q817 388 817 338Q817 171 702 75Q628 17 515 2Q504 1 270 0H39V62H147V624H39ZM655 337Q655 370 655 390T650 442T639 494T616 540T580 580T526 607T451 623Q443 624 368 624H298V62H377H387H407Q445 62 472 65T540 83T606 129Q629 156 640 195T653 262T655 337Z"></path><path stroke-width="1" id="MJMAINB-41" d="M296 0Q278 3 164 3Q58 3 49 0H40V62H92Q144 62 144 64Q388 682 397 689Q403 698 434 698Q463 698 471 689Q475 686 538 530T663 218L724 64Q724 62 776 62H828V0H817Q796 3 658 3Q509 3 485 0H472V62H517Q561 62 561 63L517 175H262L240 120Q218 65 217 64Q217 62 261 62H306V0H296ZM390 237L492 238L440 365Q390 491 388 491Q287 239 287 237H390Z"></path><path stroke-width="1" id="MJMAINB-43" d="M64 343Q64 502 174 599T468 697Q502 697 533 691T586 674T623 655T647 639T657 632L694 663Q703 670 711 677T723 687T730 692T735 695T740 696T746 697Q759 697 762 692T766 668V627V489V449Q766 428 762 424T742 419H732H720Q699 419 697 436Q690 498 657 545Q611 618 532 632Q522 634 496 634Q356 634 286 553Q232 488 232 343T286 133Q355 52 497 52Q597 52 650 112T704 237Q704 248 709 251T729 254H735Q750 254 755 253T763 248T766 234Q766 136 680 63T469 -11Q285 -11 175 86T64 343Z"></path><path stroke-width="1" id="MJMAINB-4E" d="M314 0Q296 3 181 3T48 0H39V62H147V624H39V686H171H265Q288 686 297 686T309 684T315 679Q317 676 500 455T684 233V624H576V686H585Q603 683 718 683T851 686H860V624H752V319Q752 15 750 11Q747 4 742 2T718 0H712Q708 0 706 0T700 0T696 1T693 2T690 4T687 7T684 11T679 16T674 23Q671 27 437 311L215 579V62H323V0H314Z"></path></defs></svg>

Toward Real-Time 3D Object Recognition: A Lightweight Volumetric CNN Framework Using Multitask Learning

LVNet: A lightweight volumetric convolutional neural network for real-time and high-performance recognition of 3D objects

3D-SSD: Learning Hierarchical Features from RGB-D Images for Amodal 3D Object Detection

RINet: Efficient 3D Lidar-Based Place Recognition Using Rotation Invariant Neural Network

SparseVoxNet: 3-D Object Recognition With Sparsely Aggregation of 3-D Dense Blocks

DTV-CNN: Neural network based on depth and thickness views for efficient 3D shape classification

Optimized CNNs for Rapid 3D Point Cloud Object Recognition

AGO-Net: Association-Guided 3D Point Cloud Object Detection Network

VoxNet: A 3D Convolutional Neural Network for real-time object recognition

From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection

3D-A-Nets: 3D Deep Dense Descriptor for Volumetric Shapes with Adversarial Networks

Real-Time 3D Object Detection From Point Cloud Through Foreground Segmentation

Latent-MVCNN: 3D Shape Recognition Using Multiple Views from Pre-defined or Random Viewpoints

VP-Net: Voxels as Points for 3D Object Detection

Lightweight multi-scale convolutional neural network for real time stereo matching

3DDACNN: 3D dense attention convolutional neural network for point cloud based object recognition

Virtual Sparse Convolution for Multimodal 3D Object Detection

Voxel-based 3D Detection and Reconstruction of Multiple Objects from a Single Image

DAN: Deep-Attention Network for 3D Shape Recognition

Multi-level 3D CNN for Learning Multi-scale Spatial Features

Learning To Reconstruct High-Quality 3d Shapes With Cascaded Fully Convolutional Networks