A New Mmwave-Speech Multimodal Speech System for Voice User Interface

Tiantian Liu,Feng Lin
DOI: https://doi.org/10.1145/3640087.3640096
2024-01-01
GetMobile Mobile Computing and Communications
Abstract:Voice user interface (VUI) plays an essential role in intelligent scenes, e.g., smart homes. It provides a hands- and eyes-free human-machine interaction between humans and Internet of Things devices. Benefiting from the development of deep learning and natural language process, the automatic speech recognition (ASR) entitles VUI to the capacity of accurate comprehension on users' intentions. With such a convenient and flexible service, users can interact with various devices as they please. Commercial VUI products have gained in popularity over recent years, such as smart speakers (e.g., Amazon Echo and Google Home), voice assistants in smartphones (e.g., Siri), and in-vehicle voice control interactions (e.g., VUIs in Tesla Model S/X/3/Y). Analysts forecast that by 2024, the deployment of VUI-based smart speakers will reach 640 million globally.
What problem does this paper attempt to address?