Abstract:Comprehensive perception of human beings is the prerequisite to ensure the safety of human-robot interaction. Currently, prevailing visual sensing approach typically involves a single static camera, resulting in a restricted and occluded field of view. In our work, we develop an active vision system using multiple cameras to dynamically capture multi-source RGB-D data. An integrated human sensing strategy based on a hierarchically connected tree structure is proposed to fuse localized visual information. Constituting the tree model are the nodes representing keypoints and the edges representing keyparts, which are consistently interconnected to preserve the structural constraints during multi-source fusion. Utilizing RGB-D data and HRNet, the 3D positions of keypoints are analytically estimated, and their presence is inferred through a sliding widow of confidence scores. Subsequently, the point clouds of reliable keyparts are extracted by drawing occlusion-resistant masks, enabling fine registration between data clouds and cylindrical model following the hierarchical order. Experimental results demonstrate that our method enhances keypart recognition recall from 69.20% to 90.10%, compared to employing a single static camera. Furthermore, in overcoming challenges related to localized and occluded perception, the robotic arm's obstacle avoidance capabilities are effectively improved.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to achieve comprehensive perception of human obstacles through multi - view active perception technology in human - robot interaction (HRI) so as to improve the safety and task - execution ability of robots. Specifically, the existing visual sensing methods usually rely on a single static camera, resulting in a limited field of view and being easily occluded. This not only affects the recognition and avoidance of human obstacles but also limits the ability of the robot arm to work efficiently in a dynamic environment. To overcome these challenges, the paper proposes a multi - view active perception system based on a hierarchical connection tree structure. This system uses multiple rotatable cameras to dynamically capture multi - source RGB - D data and constructs a human body model by fusing local visual information. This method can effectively improve the recall rate of key part recognition, from 69.20% to 90.10%, and performs well in dealing with local and occluded perception problems, thereby significantly improving the obstacle - avoidance ability of the robot arm. ### Main Contributions 1. **Multi - camera Active Vision System**: This system can capture RGB - D data from multiple important areas, expanding the perception range. 2. **Hierarchical Connection Tree Structure**: Used to integrate visual information from multi - view dynamic sources and maintain structural constraints. 3. **Information Extraction Method for Anti - occlusion and Local Field of View**: It can effectively handle occlusion and local field of view problems in HRI scenarios. ### Method Overview 1. **Multi - view Active Vision Mechanism**: By adding rotational degrees of freedom to the cameras, dynamic capture of multiple key areas is achieved. 2. **State Estimation of Key Points and Key Parts**: - Use HRNet to infer 2D key point positions and their confidence levels from color images. - Use depth images to lift 2D key point positions to 3D space. - Determine the existence state of key points through the sliding window method. - Fuse the key point position information of multiple cameras to obtain more accurate 3D positions. 3. **Key Part Point Cloud Extraction**: - Generate occlusion - resistant masks based on the positions of key points. - Apply masks to extract point clouds of key parts from depth images. - Use the ICP algorithm to register the point cloud with the cylindrical model and update the state of the key part. 4. **Hierarchical Connection Tree Model**: - Construct a hierarchical connection tree model to maintain anatomical constraints. - Maintain the connectivity of the tree model through supplementary nodes. - Perform state estimation in hierarchical order to ensure the connection relationship between each key part and its parent part. ### Experimental Verification The paper designs three typical production scenarios to verify the effectiveness and universality of the method: 1. **Simple Assembly Task**: A human operator performs a simple assembly task within the working area of the robot arm. 2. **Complex Assembly Task**: Involves more dynamic and complex interactions. 3. **Obstacle - avoidance Task**: Evaluate the obstacle - avoidance ability of the robot arm in different scenarios. The experimental results show that the proposed multi - view active perception system performs excellently in improving the key part recognition rate and the robot's obstacle - avoidance ability, significantly enhancing the safety and efficiency of human - robot interaction.

Multi-View Active Sensing for Human-Robot Interaction via Hierarchically Connected Tree

MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

Multi-modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer

Efficient Multi-person Hierarchical 3D Pose Estimation for Autonomous Driving

Beyond Traditional Driving Scenes: A Robotic-Centric Paradigm for 2D+3D Human Tracking Using Siamese Transformer Network

Multi-Session Slam over Low Dynamic Workspace Using Rgbd Sensor

Long-Range Traversability Awareness and Low-Lying Obstacle Negotiation with RealSense for the Visually Impaired

Real-time multiple human perception with color-depth cameras on a mobile robot

Efficient Bi-manipulation using RGBD Multi-model Fusion based on Attention Mechanism

A Two-Stage Monocular Vision Detection Method for 6D Pose Estimation in Multi-Heterogeneous Robot Systems

CEASE: Collision-Evaluation-based Active Sense System for Collaborative Robotic Arms

An Edge-Fog-Cloud-based Hierarchical Adaptive Model for Human-Robot Interaction*

A Human–Robot Collaborative System for Robust Three-Dimensional Mapping

View Invariant Human Body Detection and Pose Estimation from Multiple Depth Sensors

Hierarchical Perception-Improving for Decentralized Multi-Robot Motion Planning in Complex Scenarios

An Active Strategy for Safe Human–Robot Interaction Based on Visual–Tactile Perception

On the Evaluation of Diverse Vision Systems towards Detecting Human Pose in Collaborative Robot Applications

Human Motion Tracking by Multiple RGBD Cameras.

Multi- View Fusion for Action Recognition in Child-Robot Interaction

Kinematics-based 3D Human-Object Interaction Reconstruction from Single View

Robot Active Neural Sensing and Planning in Unknown Cluttered Environments