Abstract:This paper designed a voice interactive robot system that can conveniently execute assigned service tasks in real-life scenarios. It is equipped without a microphone where users can control the robot with spoken commands; the voice commands are then recognized by a well-trained deep neural network model of automatic speech recognition (ASR), which enables the robot to execute and complete the command based on the navigation of a real-time simultaneous localization and mapping (SLAM) algorithm. The voice interaction recognition model is divided into two parts: (1) speaker separation and (2) ASR. The speaker separation is applied by a deep-learning system consisting of eight convolution layers, one LSTM layer, and two fully connected (FC) layers to separate the speaker’s voice. This model recognizes the speaker’s voice as a referrer that separates and holds the required voiceprint and removes noises from other people’s voiceprints. Its automatic speech recognition uses the novel sandwich-type conformer model with a stack of three layers, and combines convolution and self-attention to capture short-term and long-term interactions. Specifically, it contains a multi-head self-attention module to directly convert the voice data into text for command realization. The RGB-D vision-based camera uses a real-time appearance-based mapping algorithm to create the environment map and replace the localization with a visional odometer to allow the robot to navigate itself. Finally, the proposed ASR model was tested to check if the desired results will be obtained. Performance analysis was applied to determine the robot’s environment isolation and voice recognition abilities. The results showed that the practical robot system successfully completed the interactive service tasks in a real environment. This experiment demonstrates the outstanding performance with other ASR methods and voice control mobile robot systems. It also verified that the designed voice interaction recognition system enables the mobile robot to execute tasks in real-time, showing that it is a convenient way to complete the assigned service applications.

Automatically Generating and Improving Voice Command Interface from Operation Sequences on Smartphones

AutoTask: Executing Arbitrary Voice Commands by Exploring and Learning from Mobile GUI

Voicify Your UI: Towards Android App Control with Voice Commands

A New Mmwave-Speech Multimodal Speech System for Voice User Interface

Lip-Interact: Improving Mobile Device Interaction with Silent Speech Commands.

Empowering LLM to use Smartphone for Intelligent Task Automation

GPTVoiceTasker: Advancing Multi-step Mobile Task Efficiency Through Dynamic Interface Exploration and Learning

AutoDroid: LLM-powered Task Automation in Android

Voice Enabling Mobile Applications with UIVoice

Voice Interaction Recognition Design in Real-Life Scenario Mobile Robot Applications

AVATAR: Robust Voice Search Engine Leveraging Autoregressive Document Retrieval and Contrastive Learning

Evaluating Personal Assistants on Mobile devices

VOICE BASED VIRTUAL ASSISTANT

From Voice to Value: Leveraging AI to Enhance Spoken Online Reviews on the Go

Intelligent Virtual Assistants with LLM-based Process Automation

Mobile robot: automatic speech recognition application for automation and STEM education

Test2VA: Reusing GUI Test Cases for Voice Assistant Features Development in Mobile Applications

Training a Vision Language Model as Smartphone Assistant

Commanding and Re-Dictation