Abstract:Purpose Assembly action recognition plays an important role in assembly process monitoring and human-robot collaborative assembly. Previous works overlook the interaction relationship between hands and operated objects and lack the modeling of subtle hand motions, which leads to a decline in accuracy for fine-grained action recognition. This paper aims to model the hand-object interactions and hand movements to realize high-accuracy assembly action recognition. Design/methodology/approach In this paper, a novel multi-stream hand-object interaction network (MHOINet) is proposed for assembly action recognition. To learn the hand-object interaction relationship in assembly sequence, an interaction modeling network (IMN) comprising both geometric and visual modeling is exploited in the interaction stream. The former captures the spatial location relation of hand and interacted parts/tools according to their detected bounding boxes, and the latter focuses on mining the visual context of hand and object at pixel level through a position attention model. To model the hand movements, a temporal enhancement module (TEM) with multiple convolution kernels is developed in the hand stream, which captures the temporal dependences of hand sequences in short and long ranges. Finally, assembly action prediction is accomplished by merging the outputs of different streams through a weighted score-level fusion. A robotic arm component assembly dataset is created to evaluate the effectiveness of the proposed method. Findings The method can achieve the recognition accuracy of 97.31% and 95.32% for coarse and fine assembly actions, which outperforms other comparative methods. Experiments on human-robot collaboration prove that our method can be applied to industrial production. Originality/value The author proposes a novel framework for assembly action recognition, which simultaneously leverages the features of hands, objects and hand-object interactions. The TEM enhances the representation of dynamics of hands and facilitates the recognition of assembly actions with various time spans. The IMN learns the semantic information from hand-object interactions, which is significant for distinguishing fine assembly actions.

Data-efficient Multimodal Human Action Recognition for Proactive Human–robot Collaborative Assembly: A Cross-Domain Few-Shot Learning Approach

A Skeleton-Based Assembly Action Recognition Method with Feature Fusion for Human-Robot Collaborative Assembly

An Attention-Based Deep Learning Approach for Inertial Motion Recognition and Estimation in Human-Robot Collaboration

Toward Proactive Human–Robot Collaborative Assembly: A Multimodal Transfer-Learning-Enabled Action Prediction Approach

A Novel Multi-Stream Hand-Object Interaction Network for Assembly Action Recognition

Hybrid Convolutional Neural Network Approaches for Recognizing Collaborative Actions in Human–Robot Assembly Tasks

B2C-AFM: Bi-Directional Co-Temporal and Cross-Spatial Attention Fusion Model for Human Action Recognition.

Online Robust Action Recognition Based on a Hierarchical Model

Hybrid machine learning for human action recognition and prediction in assembly

Action Fusion Recognition Model Based on GAT-GRU Binary Classification Networks for Human-Robot Collaborative Assembly

Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction

Deep Learning-based Multimodal Control Interface for Human-Robot Collaboration

Action Recognition for Human–Robot Teaming: Exploring Mutual Performance Monitoring Possibilities

Deep learning based assembly process action recognition and progress prediction facing human-centric intelligent manufacturing

A Deep Learning-Enabled Human-Cyber-physical Fusion Method Towards Human-Robot Collaborative Assembly.

Research on Human-Machine Task Collaboration Based on Action Recognition

Deep Learning-Based Human Motion Recognition for Predictive Context-Aware Human-Robot Collaboration

Language Supervised Human Action Recognition with Salient Fusion: Construction Worker Action Recognition as a Use Case

Prediction-Based Human-Robot Collaboration in Assembly Tasks Using a Learning from Demonstration Model

Multi-sensor Fusion Based Industrial Action Recognition Method under the Environment of Intelligent Manufacturing