AM2FNet: Attention-based Multiscale & Multi-modality Fused Network

Rong Chen,Zhiyong Huang,Yuanlong Yu
DOI: https://doi.org/10.1109/robio49542.2019.8961556
2019-01-01
Abstract:How to infer the 3D geometries and 3D semantic labels for each unit in a scene, including visible surfaces and occluded parts, is an important issue in many robotic fields. In recent years, there exists some studies on segmenting and completing 3D scene from 2D information. Most of them complete a scene from a single depth image. Compared with the depth image, the RGB image contains more color features and contour features, which can help to semantic labeling. However, how to design an effective strategy to fuse RGB and depth features is a challenge issue. Our paper presents an attention-based multi-scale & multi-modality fused network, called AM 2 FNet, which includes six modules: depth feature module, color feature module, 3D integration module for multi-modality feature fusion, 3D refinement module for multi-scale feature fusion, attention modules, semantic mapping module. The integration module and the refinement module work together in 3D space to fuse color and depth features at low-level, middle-level and high-level in a top-down fashion. In addition, we use an attention module to efficiently bias input-related features. Experimental results show that our proposed network can generate higher-quality semantic scene completion (SSC) results and scene completion (SC) results, and outperforms the state-of-the-art methods on real NYU and synthetic NYUCAD datasets. Meanwhilethe contribions of single modules have been illustrated.
What problem does this paper attempt to address?