Abstract:The lightweight human-robot interaction model with high real-time, high accuracy, and strong anti-interference capability can be better applied to future lunar surface exploration and construction work. Based on the feature information inputted from the monocular camera, the signal acquisition and processing fusion of the astronaut gesture and eye-movement modal interaction can be performed. Compared with the single mode, the human-robot interaction model of bimodal collaboration can achieve the issuance of complex interactive commands more efficiently. The optimization of the target detection model is executed by inserting attention into YOLOv4 and filtering image motion blur. The central coordinates of pupils are identified by the neural network to realize the human-robot interaction in the eye movement mode. The fusion between the astronaut gesture signal and eye movement signal is performed at the end of the collaborative model to achieve complex command interactions based on a lightweight model. The dataset used in the network training is enhanced and extended to simulate the realistic lunar space interaction environment. The human-robot interaction effects of complex commands in the single mode are compared with those of complex commands in the bimodal collaboration. The experimental results show that the concatenated interaction model of the astronaut gesture and eye movement signals can excavate the bimodal interaction signal better, discriminate the complex interaction commands more quickly, and has stronger signal anti-interference capability based on its stronger feature information mining ability. Compared with the command interaction realized by using the single gesture modal signal and the single eye movement modal signal, the interaction model of bimodal collaboration is shorter about 79% to 91% of the time under the single mode interaction. Regardless of the influence of any image interference item, the overall judgment accuracy of the proposed model can be maintained at about 83% to 97%. The effectiveness of the proposed method is verified.

Tell and show: Combining multiple modalities to communicate manipulation tasks to a robot

Enhancing Human–Robot Collaboration through a Multi-Module Interaction Framework with Sensor Fusion: Object Recognition, Verbal Communication, User of Interest Detection, Gesture and Gaze Recognition

Multimodal Uncertainty Reduction for Intention Recognition in Human-Robot Interaction

See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation

Multimodal information fusion for human-robot interaction

Decision Making of Mobile Robot based on Multimodal Fusion

Communicating human intent to a robotic companion by multi-type gesture sentences

Multi-Modal Human-Machine Communication for Instructing Robot Grasping Tasks

Communicating Inferred Goals with Passive Augmented Reality and Active Haptic Feedback

Multimodal representation models for prediction and control from partial information

Improved Mutual Understanding for Human-Robot Collaboration: Combining Human-Aware Motion Planning with Haptic Feedback Devices for Communicating Planned Trajectory

Probabilistic Multimodal Modeling for Human-Robot Interaction Tasks

Research on Multimodal Human-Robot Interaction Based on Speech and Gesture.

Multimodal Human-robot Interaction on Service Robot

Bridging vision and touch: advancing robotic interaction prediction with self-supervised multimodal learning

Jointly Optimizing Sensing Pipelines for Multimodal Mixed Reality Interaction

A Comparison of Visualisation Methods for Disambiguating Verbal Requests in Human-Robot Interaction

Interactive method research of dual mode information coordination integration for astronaut gesture and eye movement signals based on hybrid model

Multimodal Robot Programming by Demonstration: A Preliminary Exploration

Advantages of Multimodal versus Verbal-Only Robot-to-Human Communication with an Anthropomorphic Robotic Mock Driver

Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation