CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoor Object Detection from Multi-view Images

Guanlin Shen,Jingwei Huang,Zhihua Hu,Bin Wang
DOI: https://doi.org/10.1109/cvpr52733.2024.02015
2024-01-01
Computer Vision and Pattern Recognition
Abstract:This paper introduces CN-RMA, a novel approach for 3D indoor object detectionfrom multi-view images. We observe the key challenge as the ambiguity of imageand 3D correspondence without explicit geometry to provide occlusioninformation. To address this issue, CN-RMA leverages the synergy of 3Dreconstruction networks and 3D object detection networks, where thereconstruction network provides a rough Truncated Signed Distance Function(TSDF) and guides image features to vote to 3D space correctly in an end-to-endmanner. Specifically, we associate weights to sampled points of each raythrough ray marching, representing the contribution of a pixel in an image tocorresponding 3D locations. Such weights are determined by the predicted signeddistances so that image features vote only to regions near the reconstructedsurface. Our method achieves state-of-the-art performance in 3D objectdetection from multi-view images, as measured by mAP@0.25 and mAP@0.5 on theScanNet and ARKitScenes datasets. The code and models are released athttps://github.com/SerCharles/CN-RMA.
What problem does this paper attempt to address?