Abstract:In virtual teaching scenarios, head-mounted display (HMD) interactions often employ traditional controller and UI interactions, which are not very conducive to teaching scenarios that require hand training. Existing improvements in this area have primarily focused on replacing controllers with gesture recognition. However, the exclusive use of gesture recognition may have limitations in certain scenarios, such as complex operations or multitasking environments. This study designed and tested an interaction method that combines simple gestures with voice assistance, aiming to offer a more intuitive user experience and enrich related research. A speech classification model was developed that can be activated via a fist-clenching gesture and is capable of recognising specific Chinese voice commands to initiate various UI interfaces, further controlled by pointing gestures. Virtual scenarios were constructed using Unity, with hand tracking achieved through the HTC OpenXR SDK. Within Unity, hand rendering and gesture recognition were facilitated, and interaction with the UI was made possible using the Unity XR Interaction Toolkit. The interaction method was detailed and exemplified using a teacher training simulation system, including sample code provision. Following this, an empirical test involving 20 participants was conducted, comparing the gesture-plus-voice operation to the traditional controller operation, both quantitatively and qualitatively. The data suggests that while there is no significant difference in task completion time between the two methods, the combined gesture and voice method received positive feedback in terms of user experience, indicating a promising direction for such interactive methods. Future work could involve adding more gestures and expanding the model training dataset to realize additional interactive functions, meeting diverse virtual teaching needs.

The self-taught vocal interface

A New Mmwave-Speech Multimodal Speech System for Voice User Interface

Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control

Multichannel automatic recognition of voice command in a multi-room smart home: an experiment involving seniors and users with visual impairment

Wavoice: an Mmwave-Assisted Noise-Resistant Speech Recognition System

Wavoice: an Mmwave-Assisted Noise-Resistant Speech Recognition System.

Development of speech recognition system for remote vocal music teaching based on Markov model

Using Voice Technologies to Support Disabled People

Automatic recognition of child speech for robotic applications in noisy environments

No More Mumbles: Enhancing Robot Intelligibility through Speech Adaptation

Design and implementation of smart voice assistant and recognizing academic words

Pre-training for low resource speech-to-intent applications

Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications

Vocal Sandbox: Continual Learning and Adaptation for Situated Human-Robot Collaboration

Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition

Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization

Giving Robots a Voice: Human-in-the-Loop Voice Creation and open-ended Labeling

An open-source voice type classifier for child-centered daylong recordings

VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive Robots

DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech

Interactive Design With Gesture and Voice Recognition in Virtual Teaching Environments