Abstract:This paper designed a voice interactive robot system that can conveniently execute assigned service tasks in real-life scenarios. It is equipped without a microphone where users can control the robot with spoken commands; the voice commands are then recognized by a well-trained deep neural network model of automatic speech recognition (ASR), which enables the robot to execute and complete the command based on the navigation of a real-time simultaneous localization and mapping (SLAM) algorithm. The voice interaction recognition model is divided into two parts: (1) speaker separation and (2) ASR. The speaker separation is applied by a deep-learning system consisting of eight convolution layers, one LSTM layer, and two fully connected (FC) layers to separate the speaker’s voice. This model recognizes the speaker’s voice as a referrer that separates and holds the required voiceprint and removes noises from other people’s voiceprints. Its automatic speech recognition uses the novel sandwich-type conformer model with a stack of three layers, and combines convolution and self-attention to capture short-term and long-term interactions. Specifically, it contains a multi-head self-attention module to directly convert the voice data into text for command realization. The RGB-D vision-based camera uses a real-time appearance-based mapping algorithm to create the environment map and replace the localization with a visional odometer to allow the robot to navigate itself. Finally, the proposed ASR model was tested to check if the desired results will be obtained. Performance analysis was applied to determine the robot’s environment isolation and voice recognition abilities. The results showed that the practical robot system successfully completed the interactive service tasks in a real environment. This experiment demonstrates the outstanding performance with other ASR methods and voice control mobile robot systems. It also verified that the designed voice interaction recognition system enables the mobile robot to execute tasks in real-time, showing that it is a convenient way to complete the assigned service applications.

Cross-modal Task Understanding and Execution of Voice-fingertip Reading Instruction by Using Small Family Service Robotic

Contactless Interaction System Based on Facial Expression Recognition for Humanoid Piano Robot

A Small Family Service Robot System with Uncalibrated Monocular Camera for Visual Servoing Tracking Fast Moving Family Targets in Short Range

Audio-Visual Bimodal Combination-Based Speaker Tracking Method for Mobile Robot

Audio–visual Language Instruction Understanding for Robotic Sorting

Voice Interaction Recognition Design in Real-Life Scenario Mobile Robot Applications

"Pass the butter": A study on desktop-classic multitasking robotic arm based on advanced YOLOv7 and BERT

Robi Butler: Remote Multimodal Interactions with Household Robot Assistant

Multi-Modal Human-Machine Communication for Instructing Robot Grasping Tasks

Natural Language Instruction Understanding for Robotic Manipulation: a Multisensory Perception Approach.

Dynamic Hand Gesture-Featured Human Motor Adaptation in Tool Delivery using Voice Recognition

Language-Conditioned Robotic Manipulation with Fast and Slow Thinking

SIFToM: Robust Spoken Instruction Following through Theory of Mind

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

A perceptual manipulation system for audio-visual fusion of robots

Learning Visual-Audio Representations for Voice-Controlled Robots

A multimodal domestic service robot interaction system for people with declined abilities to express themselves

A Voice Recognition Sensor and Voice Control System in an Intelligent Toy Robot System

Object-Centric Instruction Augmentation for Robotic Manipulation

See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation

Human-assisted Sound Event Recognition for Home Service Robots.