Abstract:Human-robot interaction is an essential capability for humanoid robots to enter the physical world and become companions in people's lives, learning, and work. While the majority of current research focuses on the voice-based interactions of robots, yet over 60% of communication occurs through nonverbal behaviors, such as facial expressions and hand gestures. Endowing robots with the ability to communicate through nonverbal behavior not only enhances the interactive experience with robots but also provides a potential communication tool for individuals with hearing or speech impairments. Here, we develop a humanoid robot capable of adjusting facial movements by driving servos, and design a novel framework for the robot to integrate sign language recognition and facial landmark detection algorithms. This framework facilitates the robot recognize sign language and translate it into spoken language, while also imitating the facial expressions of the signers. To achieve this, we also propose a lightweight deep learning network called RealTimeSignNet for real-time sign language recognition. Leveraging lightweight 3D convolution modules and time-dependent constraints, this model adapts to various time scales, ensuring efficient processing of sign language recognition tasks. Experimental results demonstrate the outstanding performance of the RealTimeSignNet model on mainstream sign language datasets, achieving an accuracy of 88.1% on the large continuous sign language dataset (continuous SLR), 98.2% on the isolated sign language dataset (SLR 500), and 91.50% on the English sign language dataset (WLAS). The overall assessment demonstrates that our humanoid robot is capable of recognizing sign language and translating it into spoken language, while imitating the facial emotions, providing a comprehensive solution to the communication challenges faced by individuals with hearing and speech impairments.

Multimodal fusion-powered English speaking robot

A multimodal educational robots driven via dynamic attention

Multimodal Human-robot Interaction on Service Robot

Research on Multimodal Human-Robot Interaction Based on Speech and Gesture.

Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models

Multimodal Activation: Awakening Dialog Robots Without Wake Words

A Multimodal Emotional Communication Based Humans-Robots Interaction System

A multimodal human-robot sign language interaction framework applied in social robots

A Lightweight Network-Based Sign Language Robot with Facial Mirroring and Speech System

No More Mumbles: Enhancing Robot Intelligibility through Speech Adaptation

NMM-HRI: Natural Multi-modal Human-Robot Interaction with Voice and Deictic Posture via Large Language Model

Multimodal information fusion for human-robot interaction

Enhancing Human–Robot Collaboration through a Multi-Module Interaction Framework with Sensor Fusion: Object Recognition, Verbal Communication, User of Interest Detection, Gesture and Gaze Recognition

When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration

Safe Multimodal Communication in Human-Robot Collaboration

MHRC: Closed-loop Decentralized Multi-Heterogeneous Robot Collaboration with Large Language Models

Decision Making of Mobile Robot based on Multimodal Fusion

A perceptual manipulation system for audio-visual fusion of robots

Multimodal integration learning of robot behavior using deep neural networks

Research on multimodal human-computer interaction technology based on audiovisual fusion

Real-Time Multi-modal Human-Robot Collaboration Using Gestures and Speech