Data-efficient Multimodal Human Action Recognition for Proactive Human–robot Collaborative Assembly: A Cross-Domain Few-Shot Learning Approach

Tianyu Wang,Zhihao Liu,Lihui Wang,Mian Li,Xi Vincent Wang
DOI: https://doi.org/10.1016/j.rcim.2024.102785
IF: 10.103
2024-01-01
Robotics and Computer-Integrated Manufacturing
Abstract:With the recent vision of Industry 5.0, the cognitive capability of robots plays a crucial role in advancing proactive human-robot collaborative assembly. As a basis of the mutual empathy, the understanding of a human operator's intention has been primarily studied through the technique of human action recognition. Existing deep learning -based methods demonstrate remarkable efficacy in handling information -rich data such as physiological measurements and videos, where the latter category represents a more natural perception input. However, deploying these methods in new unseen assembly scenarios requires first collecting abundant case -specific data. This leads to significant manual effort and poor flexibility. To deal with the issue, this paper proposes a novel cross -domain few -shot learning method for data -efficient multimodal human action recognition. A hierarchical data fusion mechanism is designed to jointly leverage the skeletons, RGB images and depth maps with complementary information. Then a temporal CrossTransformer is developed to enable the action recognition with very limited amount of data. Lightweight domain adapters are integrated to further improve the generalization with fast finetuning. Extensive experiments on a real car engine assembly case show the superior performance of proposed method over state-of-the-art regarding both accuracy and finetuning efficiency. Real-time demonstrations and ablation study further indicate the potential of early recognition, which is beneficial for the robot procedures generation in practical applications. In summary, this paper contributes to the rarely explored realm of data -efficient human action recognition for proactive human-robot collaboration.
What problem does this paper attempt to address?