Abstract:Purpose Assembly action recognition plays an important role in assembly process monitoring and human-robot collaborative assembly. Previous works overlook the interaction relationship between hands and operated objects and lack the modeling of subtle hand motions, which leads to a decline in accuracy for fine-grained action recognition. This paper aims to model the hand-object interactions and hand movements to realize high-accuracy assembly action recognition. Design/methodology/approach In this paper, a novel multi-stream hand-object interaction network (MHOINet) is proposed for assembly action recognition. To learn the hand-object interaction relationship in assembly sequence, an interaction modeling network (IMN) comprising both geometric and visual modeling is exploited in the interaction stream. The former captures the spatial location relation of hand and interacted parts/tools according to their detected bounding boxes, and the latter focuses on mining the visual context of hand and object at pixel level through a position attention model. To model the hand movements, a temporal enhancement module (TEM) with multiple convolution kernels is developed in the hand stream, which captures the temporal dependences of hand sequences in short and long ranges. Finally, assembly action prediction is accomplished by merging the outputs of different streams through a weighted score-level fusion. A robotic arm component assembly dataset is created to evaluate the effectiveness of the proposed method. Findings The method can achieve the recognition accuracy of 97.31% and 95.32% for coarse and fine assembly actions, which outperforms other comparative methods. Experiments on human-robot collaboration prove that our method can be applied to industrial production. Originality/value The author proposes a novel framework for assembly action recognition, which simultaneously leverages the features of hands, objects and hand-object interactions. The TEM enhances the representation of dynamics of hands and facilitates the recognition of assembly actions with various time spans. The IMN learns the semantic information from hand-object interactions, which is significant for distinguishing fine assembly actions.

Robot recognizing humans intention and interacting with humans based on a multi-task model combining ST-GCN-LSTM model and YOLO model

A Human-Robot Collaboration System for Object Handover

Design of Kinect-Based Human Robot Interaction Systems for A Robocup Middle Size League Soccer Robot

InterRep: A Visual Interaction Representation for Robotic Grasping

An Attention-Based Deep Learning Approach for Inertial Motion Recognition and Estimation in Human-Robot Collaboration

Human–robot interaction-oriented video understanding of human actions

A Novel Multi-Stream Hand-Object Interaction Network for Assembly Action Recognition

Enhancing Human–Robot Collaboration through a Multi-Module Interaction Framework with Sensor Fusion: Object Recognition, Verbal Communication, User of Interest Detection, Gesture and Gaze Recognition

Hybrid Recurrent Neural Network Architecture-Based Intention Recognition for Human-Robot Collaboration

A gesture recognition system using Localist Attractor Networks for human-robot interaction

Learning Multimodal Confidence for Intention Recognition in Human-Robot Interaction

Robot-To-Human Handover with Obstacle Avoidance Via Continuous Time Recurrent Neural Network

Multi-Robot Behavior Adaptation to Humans' Intention in Human-Robot Interaction Using Information-Driven Fuzzy Friend-Q Learning.

A Multimodal Information Fusion Model for Robot Action Recognition with Time Series

Human-Robot handover task intention recognition framework by fusing human digital twin and deep domain adaptation

Predicting Human Intentions in Human-Robot Hand-Over Tasks Through Multimodal Learning

Implementation of Engagement Detection for Human–Robot Interaction in Complex Environments

Cross-View Human Intention Recognition for Human-Robot Collaboration

Learning Human-to-Robot Dexterous Handovers for Anthropomorphic Hand

Detecting And Tracking Objects In Hri: Yolo Networks For The Nao "I See You" Function

Multi - target objects and complex color recognition model based on humanoid robot