OpenNav: Efficient Open Vocabulary 3D Object Detection for Smart Wheelchair Navigation

Muhammad Rameez ur Rahman,Piero Simonetto,Anna Polato,Francesco Pasti,Luca Tonin,Sebastiano Vascon
2024-08-26
Abstract:Open vocabulary 3D object detection (OV3D) allows precise and extensible object recognition crucial for adapting to diverse environments encountered in assistive robotics. This paper presents OpenNav, a zero-shot 3D object detection pipeline based on RGB-D images for smart wheelchairs. Our pipeline integrates an open-vocabulary 2D object detector with a mask generator for semantic segmentation, followed by depth isolation and point cloud construction to create 3D bounding boxes. The smart wheelchair exploits these 3D bounding boxes to identify potential targets and navigate safely. We demonstrate OpenNav's performance through experiments on the Replica dataset and we report preliminary results with a real wheelchair. OpenNav improves state-of-the-art significantly on the Replica dataset at mAP25 (+9pts) and mAP50 (+5pts) with marginal improvement at mAP. The code is publicly available at this link: <a class="link-external link-https" href="https://github.com/EasyWalk-PRIN/OpenNav" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The aim of this paper is to develop an open vocabulary zero-shot 3D object detection pipeline (OpenNav) to enhance the navigation capabilities of smart wheelchairs. Specifically, the goals of the paper include: 1. **Open Vocabulary Zero-Shot 3D Object Detection**: Proposing a method that combines an open vocabulary 2D object detector based on RGB-D images with a semantic segmentation mask generator to create 3D bounding boxes through depth isolation and point cloud construction. This approach enables the system to recognize and locate new objects in real environments without extensive training on specific categories. 2. **Improving Navigation Performance**: Validating the performance of OpenNav on the Replica dataset through experiments and demonstrating its practical application on smart wheelchairs, proving the effectiveness of the method in 3D object detection and accurate object recognition. 3. **Adapting to Diverse Environments**: The system can accurately and flexibly identify objects, which is crucial for adapting to various daily life scenarios. The open vocabulary feature means it can easily be extended to recognize new objects or adapt to specific user needs without extensive retraining. In summary, the main objective of this paper is to develop an efficient, flexible, and scalable 3D object detection pipeline to enhance the navigation capabilities and safety of smart wheelchairs in different environments.