Abstract:The ability to perceive humans is an essential requirement for safe and efficient human-robot interaction. In real-world applications, the need for a robot to interact in real time with multiple humans in a dynamic, 3-D environment presents a significant challenge. The recent availability of commercial color-depth cameras allow for the creation of a system that makes use of the depth dimension, thus enabling a robot to observe its environment and perceive in the 3-D space. Here we present a system for 3-D multiple human perception in real time from a moving robot equipped with a color-depth camera and a consumer-grade computer. Our approach reduces computation time to achieve real-time performance through a unique combination of new ideas and established techniques. We remove the ground and ceiling planes from the 3-D point cloud input to separate candidate point clusters. We introduce the novel information concept, depth of interest, which we use to identify candidates for detection, and that avoids the computationally expensive scanning-window methods of other approaches. We utilize a cascade of detectors to distinguish humans from objects, in which we make intelligent reuse of intermediary features in successive detectors to improve computation. Because of the high computational cost of some methods, we represent our candidate tracking algorithm with a decision directed acyclic graph, which allows us to use the most computationally intense techniques only where necessary. We detail the successful implementation of our novel approach on a mobile robot and examine its performance in scenarios with real-world challenges, including occlusion, robot motion, nonupright humans, humans leaving and reentering the field of view (i.e., the reidentification challenge), human-object and human-human interaction. We conclude with the observation that the incorporation of the depth information, together with the use of modern techniques in new ways, we are able to create an accurate system for real-time 3-D perception of humans by a mobile robot.

An Object Perception and Positioning Method Via Deep Perception Learning Object Detection

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Vision-Based Environmental Perception for Autonomous Driving

Stereo RGB and Deeper LIDAR Based Network for 3D Object Detection

Multi-Class Detection and Segmentation of Objects in Depth

Target Recognition and Location Based on Deep Learning

Object Detection Based on Deep Learning and B-Spline Level Set in Color Images

Robust Real-Time Human Perception with Depth Camera.

Probabilistic and Geometric Depth: Detecting Objects in Perspective

DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map

Depth Estimation Matters Most: Improving Per-Object Depth Estimation for Monocular 3D Detection and Tracking

A survey of Object Classification and Detection based on 2D/3D data

Non-line-of-sight imaging and tracking of moving objects based on deep learning

Spatio-Temporal Fusion of LiDAR and Camera Data for Omnidirectional Depth Perception

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

Deep learning‐based object identification with instance segmentation and pseudo‐LiDAR point cloud for work zone safety

ODSPC: deep learning-based 3D object detection using semantic point cloud

Real-time multiple human perception with color-depth cameras on a mobile robot

You Only Look Bottom-Up for Monocular 3D Object Detection

Deep learning based object detection from multi-modal sensors: an overview

3D Street Object Detection from Monocular Images Using Deep Learning and Depth Information