Real-Time 3-D Semantic Scene Parsing With LiDAR Sensors

Fei Wang,Yan Zhuang,Hong Zhang,Hong Gu
DOI: https://doi.org/10.1109/tcyb.2020.2982947
IF: 11.8
2022-03-01
IEEE Transactions on Cybernetics
Abstract:This article proposes a novel deep-learning framework, called RSSP, for real-time 3-D scene understanding with LiDAR sensors. To this end, we introduce new sparse strided operations based on the sparse tensor representation of point clouds. Compared with conventional convolution operations, the time and space complexity of our sparse strided operations are proportional to the number of occupied voxels <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.064ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 888.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-4E" x="0" y="0"></use></g></svg></span> rather than the input spatial size <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.103ex" height="2.509ex" style="vertical-align: -0.338ex;" viewBox="0 -934.9 905.4 1080.4" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-72" x="0" y="0"></use> <use transform="scale(0.707)" xlink:href="#MJMAIN-33" x="638" y="513"></use></g></svg></span> (often N <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.387ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 597 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-6C" x="0" y="0"></use> <use xlink:href="#MJMATHI-6C" x="298" y="0"></use></g></svg></span> r <sup>3</sup> for LiDAR data). This enables our method to process point clouds at high resolutions (e.g., 2048<sup>3</sup>) with a high speed (130 ms for classifying a single frame from Velodyne HDL-64). The main structure includes a CNN model built upon our sparse strided operations and a conditional random field (CRF) model to impose spatial consistency on the final predictions. A highly parallel implementation of our system is presented for both CPU-GPU and CPU-only environments. The efficiency and effectiveness of our approach are demonstrated on two public datasets (Semantic3D.net and KITTI). The experimental results and benchmark tests show that our system can be effectively applied for online 3-D data analyses with comparable or better accuracy than the state-of-the-art methods.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMATHI-4E" d="M234 637Q231 637 226 637Q201 637 196 638T191 649Q191 676 202 682Q204 683 299 683Q376 683 387 683T401 677Q612 181 616 168L670 381Q723 592 723 606Q723 633 659 637Q635 637 635 648Q635 650 637 660Q641 676 643 679T653 683Q656 683 684 682T767 680Q817 680 843 681T873 682Q888 682 888 672Q888 650 880 642Q878 637 858 637Q787 633 769 597L620 7Q618 0 599 0Q585 0 582 2Q579 5 453 305L326 604L261 344Q196 88 196 79Q201 46 268 46H278Q284 41 284 38T282 19Q278 6 272 0H259Q228 2 151 2Q123 2 100 2T63 2T46 1Q31 1 31 10Q31 14 34 26T39 40Q41 46 62 46Q130 49 150 85Q154 91 221 362L289 634Q287 635 234 637Z"></path><path stroke-width="1" id="MJMATHI-72" d="M21 287Q22 290 23 295T28 317T38 348T53 381T73 411T99 433T132 442Q161 442 183 430T214 408T225 388Q227 382 228 382T236 389Q284 441 347 441H350Q398 441 422 400Q430 381 430 363Q430 333 417 315T391 292T366 288Q346 288 334 299T322 328Q322 376 378 392Q356 405 342 405Q286 405 239 331Q229 315 224 298T190 165Q156 25 151 16Q138 -11 108 -11Q95 -11 87 -5T76 7T74 17Q74 30 114 189T154 366Q154 405 128 405Q107 405 92 377T68 316T57 280Q55 278 41 278H27Q21 284 21 287Z"></path><path stroke-width="1" id="MJMAIN-33" d="M127 463Q100 463 85 480T69 524Q69 579 117 622T233 665Q268 665 277 664Q351 652 390 611T430 522Q430 470 396 421T302 350L299 348Q299 347 308 345T337 336T375 315Q457 262 457 175Q457 96 395 37T238 -22Q158 -22 100 21T42 130Q42 158 60 175T105 193Q133 193 151 175T169 130Q169 119 166 110T159 94T148 82T136 74T126 70T118 67L114 66Q165 21 238 21Q293 21 321 74Q338 107 338 175V195Q338 290 274 322Q259 328 213 329L171 330L168 332Q166 335 166 348Q166 366 174 366Q202 366 232 371Q266 376 294 413T322 525V533Q322 590 287 612Q265 626 240 626Q208 626 181 615T143 592T132 580H135Q138 579 143 578T153 573T165 566T175 555T183 540T186 520Q186 498 172 481T127 463Z"></path><path stroke-width="1" id="MJMATHI-6C" d="M117 59Q117 26 142 26Q179 26 205 131Q211 151 215 152Q217 153 225 153H229Q238 153 241 153T246 151T248 144Q247 138 245 128T234 90T214 43T183 6T137 -11Q101 -11 70 11T38 85Q38 97 39 102L104 360Q167 615 167 623Q167 626 166 628T162 632T157 634T149 635T141 636T132 637T122 637Q112 637 109 637T101 638T95 641T94 647Q94 649 96 661Q101 680 107 682T179 688Q194 689 213 690T243 693T254 694Q266 694 266 686Q266 675 193 386T118 83Q118 81 118 75T117 65V59Z"></path></defs></svg>
automation & control systems,computer science, cybernetics, artificial intelligence
What problem does this paper attempt to address?