Abstract:This paper designed a voice interactive robot system that can conveniently execute assigned service tasks in real-life scenarios. It is equipped without a microphone where users can control the robot with spoken commands; the voice commands are then recognized by a well-trained deep neural network model of automatic speech recognition (ASR), which enables the robot to execute and complete the command based on the navigation of a real-time simultaneous localization and mapping (SLAM) algorithm. The voice interaction recognition model is divided into two parts: (1) speaker separation and (2) ASR. The speaker separation is applied by a deep-learning system consisting of eight convolution layers, one LSTM layer, and two fully connected (FC) layers to separate the speaker’s voice. This model recognizes the speaker’s voice as a referrer that separates and holds the required voiceprint and removes noises from other people’s voiceprints. Its automatic speech recognition uses the novel sandwich-type conformer model with a stack of three layers, and combines convolution and self-attention to capture short-term and long-term interactions. Specifically, it contains a multi-head self-attention module to directly convert the voice data into text for command realization. The RGB-D vision-based camera uses a real-time appearance-based mapping algorithm to create the environment map and replace the localization with a visional odometer to allow the robot to navigate itself. Finally, the proposed ASR model was tested to check if the desired results will be obtained. Performance analysis was applied to determine the robot’s environment isolation and voice recognition abilities. The results showed that the practical robot system successfully completed the interactive service tasks in a real environment. This experiment demonstrates the outstanding performance with other ASR methods and voice control mobile robot systems. It also verified that the designed voice interaction recognition system enables the mobile robot to execute tasks in real-time, showing that it is a convenient way to complete the assigned service applications.

A New Mmwave-Speech Multimodal Speech System for Voice User Interface

Wavoice: an Mmwave-Assisted Noise-Resistant Speech Recognition System

Wavoice: an Mmwave-Assisted Noise-Resistant Speech Recognition System.

Wavoice: A mmWave-assisted Noise-resistant Speech Recognition SystemJust Accepted

Wavoice: A Noise-resistant Multi-modal Speech Recognition System Fusing mmWave and Audio Signals

Voice Enabling Mobile Applications with UIVoice

The MIT Voice Name System

Voicify Your UI: Towards Android App Control with Voice Commands

Automatically Generating and Improving Voice Command Interface from Operation Sequences on Smartphones

M$^{3}$V: A multi-modal multi-view approach for Device-Directed Speech Detection

SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks

VoiceTalk: Multimedia-IoT Applications for Mixing Mandarin, Taiwanese and English

VOICE BASED VIRTUAL ASSISTANT

Voice Interaction Recognition Design in Real-Life Scenario Mobile Robot Applications

Robust Dual-Modal Speech Keyword Spotting for XR Headsets

New Multi-modeling Voice Interactive System

Intelligent automobile auxiliary propagation system based on speech recognition and AI driven feature extraction techniques

User Interaction Patterns and Breakdowns in Conversing with LLM-Powered Voice Assistants

GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot

Human and LLM-Based Voice Assistant Interaction: An Analytical Framework for User Verbal and Nonverbal Behaviors

Multi-Task Deep Learning for User Intention Understanding in Speech Interaction Systems