OpenNav: Efficient Open Vocabulary 3D Object Detection for Smart Wheelchair Navigation

Muhammad Rameez ur Rahman,Piero Simonetto,Anna Polato,Francesco Pasti,Luca Tonin,Sebastiano Vascon

2024-08-26

Abstract:Open vocabulary 3D object detection (OV3D) allows precise and extensible object recognition crucial for adapting to diverse environments encountered in assistive robotics. This paper presents OpenNav, a zero-shot 3D object detection pipeline based on RGB-D images for smart wheelchairs. Our pipeline integrates an open-vocabulary 2D object detector with a mask generator for semantic segmentation, followed by depth isolation and point cloud construction to create 3D bounding boxes. The smart wheelchair exploits these 3D bounding boxes to identify potential targets and navigate safely. We demonstrate OpenNav's performance through experiments on the Replica dataset and we report preliminary results with a real wheelchair. OpenNav improves state-of-the-art significantly on the Replica dataset at mAP25 (+9pts) and mAP50 (+5pts) with marginal improvement at mAP. The code is publicly available at this link: <a class="link-external link-https" href="https://github.com/EasyWalk-PRIN/OpenNav" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The aim of this paper is to develop an open vocabulary zero-shot 3D object detection pipeline (OpenNav) to enhance the navigation capabilities of smart wheelchairs. Specifically, the goals of the paper include: 1. **Open Vocabulary Zero-Shot 3D Object Detection**: Proposing a method that combines an open vocabulary 2D object detector based on RGB-D images with a semantic segmentation mask generator to create 3D bounding boxes through depth isolation and point cloud construction. This approach enables the system to recognize and locate new objects in real environments without extensive training on specific categories. 2. **Improving Navigation Performance**: Validating the performance of OpenNav on the Replica dataset through experiments and demonstrating its practical application on smart wheelchairs, proving the effectiveness of the method in 3D object detection and accurate object recognition. 3. **Adapting to Diverse Environments**: The system can accurately and flexibly identify objects, which is crucial for adapting to various daily life scenarios. The open vocabulary feature means it can easily be extended to recognize new objects or adapt to specific user needs without extensive retraining. In summary, the main objective of this paper is to develop an efficient, flexible, and scalable 3D object detection pipeline to enhance the navigation capabilities and safety of smart wheelchairs in different environments.

OpenNav: Efficient Open Vocabulary 3D Object Detection for Smart Wheelchair Navigation

Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking

Evaluation of 2D-/3D-Feet-Detection Methods for Semi-Autonomous Powered Wheelchair Navigation

OCC-VO: Dense Mapping via 3D Occupancy-Based Visual Odometry for Autonomous Driving

OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models

Three-Dimensional Outdoor Object Detection in Quadrupedal Robots for Surveillance Navigations

HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal Navigation

Obstacle Detection System for Navigation Assistance of Visually Impaired People Based on Deep Learning Techniques

Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments

Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data

OLiVia-Nav: An Online Lifelong Vision Language Approach for Mobile Robot Social Navigation

Ground-aware Monocular 3D Object Detection for Autonomous Driving

Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation

VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model

Open-set 3D semantic instance maps for vision language navigation – O3D-SIM

Open 3D World in Autonomous Driving

High-Speed Robot Navigation using Predicted Occupancy Maps

Object Detection and Spatial Coordinates Extraction Using a Monocular Camera for a Wheelchair Mounted Robotic Arm

Robust Visual Teach and Repeat for UGVs Using 3D Semantic Maps