Abstract:Systems with voice control are an attractive option for increasing technological integration, not only for people with little knowledge on technology or constrained Internet access, but also for people with certain disabilities. In addition, devices based on Alexa or Google Home provide an interesting alternative for interacting with Internet of Things (IoT) devices, but they usually rely on an Internet connection to a cloud server for their full operation. Furthermore, many voice-recognition systems are only available in a limited number of languages, which tend to be those with the highest number of speakers, thus excluding minority-language speakers. To address the previously mentioned issues, this article presents a solution based on Edge Computing and voice commands that carries out offline voice processing and that is able to interact with IoT-based systems. The proposed system performs local speech inference, providing a communication interface with IoT devices in a Bluetooth mesh, all in a fast way and without the need for an Internet connection. In addition, the proposed solution can be adapted easily for voice recognition of languages with few resources. Such a feature is demonstrated with the Galician language, which is spoken by less than 3 million people worldwide. In particular, different Automatic Speech Recognition (ASR) models based on three of the most popular ASR development frameworks (wav2vec2, DistilHubert, Whisper) were developed to transcribe short speech and to translate it into IoT commands that perform specific home-automation actions. Such models were fine-tuned for Galician with a corpus of approximately 20 hours and were evaluated in static and mobile opportunistic scenarios in terms of accuracy, energy consumption and latency on an embedded platform (that acts as an edge device) and on a cloud server. The obtained results show that inference is performed in less than 2 seconds on a Raspberry Pi 4 for the two smallest models and in less than 500ms on a high-end Android smartphone when processing all data locally with CPU-only inference (i.e., without hardware acceleration or external processing). The results of the transcriptions are accurate enough to be able to use simple text distance algorithms to detect keywords in the speech and perform commands on IoT devices. In particular, a maximum success rate of 92% was achieved for detecting the indicated commands when using models optimized for being executed on embedded devices. For selected home scenarios, command actions were sent via Bluetooth with average response times of up to 113 ms.

Speech Coding, Speech Interfaces and IoT - Opportunities and Challenges

Privacy in Speech Technology

Guest Editorial: Task-Oriented Communications for Future Wireless Networks

Voice Activated IoT Devices for Healthcare: Design Challenges and Emerging Applications

New Challenges for Content Privacy in Speech and Audio

VoiceTalk: Multimedia-IoT Applications for Mixing Mandarin, Taiwanese and English

Challenges in real-time-embedded IoT Command Recognition

Taxonomic Classification of IoT Smart Home Voice Control

A Framework for Smart Home System with Voice Control Using NLP Methods

Improving Voice Conversion for Dissimilar Speakers Using Perceptual Losses

Future Communication Trends toward Internet of Things Services and Applications

Voice-Controlled IoT Devices Framework for Smart Home

Real-Time Lightweight Chaotic Encryption for 5G IoT Enabled Lip-Reading Driven Secure Hearing-Aid

IoT based Personal Voice Assistant

Design, Implementation, and Practical Evaluation of a Voice Recognition Based IoT Home Automation System for Low-Resource Languages and Resource-Constrained Edge IoT Devices: A System for Galician and Mobile Opportunistic Scenarios

High-Reliability and Low-Latency Wireless Communication for Internet of Things: Challenges, Fundamentals, and Enabling Technologies

Understanding Barriers and Design Opportunities to Improve Healthcare and QOL for Older Adults through Voice Assistants

Trends and Perspectives for Signal Processing in Consumer Audio

Speech Technology Services for Oral History Research

Talking with Robots: Opportunities and Challenges

Embracing the Future Internet of Things