Abstract:In virtual teaching scenarios, head-mounted display (HMD) interactions often employ traditional controller and UI interactions, which are not very conducive to teaching scenarios that require hand training. Existing improvements in this area have primarily focused on replacing controllers with gesture recognition. However, the exclusive use of gesture recognition may have limitations in certain scenarios, such as complex operations or multitasking environments. This study designed and tested an interaction method that combines simple gestures with voice assistance, aiming to offer a more intuitive user experience and enrich related research. A speech classification model was developed that can be activated via a fist-clenching gesture and is capable of recognising specific Chinese voice commands to initiate various UI interfaces, further controlled by pointing gestures. Virtual scenarios were constructed using Unity, with hand tracking achieved through the HTC OpenXR SDK. Within Unity, hand rendering and gesture recognition were facilitated, and interaction with the UI was made possible using the Unity XR Interaction Toolkit. The interaction method was detailed and exemplified using a teacher training simulation system, including sample code provision. Following this, an empirical test involving 20 participants was conducted, comparing the gesture-plus-voice operation to the traditional controller operation, both quantitatively and qualitatively. The data suggests that while there is no significant difference in task completion time between the two methods, the combined gesture and voice method received positive feedback in terms of user experience, indicating a promising direction for such interactive methods. Future work could involve adding more gestures and expanding the model training dataset to realize additional interactive functions, meeting diverse virtual teaching needs.

Enabling Voice-Accompanying Hand-to-Face Gesture Recognition with Cross-Device Sensing

Exploring Interactive Gestures with Voice Assistant on HMDs in Social Situations

HCI on the Table: Robust Gesture Recognition Using Acoustic Sensing in Your Hand

Gesture Recognition with a 3-D Accelerometer

User-Defined Gestures for Gestural Interaction: Extending from Hands to Other Body Parts

Designing and Evaluating Hand-to-Hand Gestures with Dual Commodity Wrist-Worn Devices

iFace: Hand-Over-Face Gesture Recognition Leveraging Impedance Sensing

3D Intuitive Gesture Interaction via Motion Sensing

WristCam: A Wearable Sensor for Hand Trajectory Gesture Recognition and Intelligent Human–Robot Interaction

Acoustic Sensing-based Hand Gesture Detection for Wearable Device Interaction

Efficient High Cross-User Recognition Rate Ultrasonic Hand Gesture Recognition System

SignID: Acoustic-based Identification with Single Sign Gesture

WristSonic: Enabling Fine-grained Hand-Face Interactions on Smartwatches Using Active Acoustic Sensing

Interactive Design With Gesture and Voice Recognition in Virtual Teaching Environments

UltraGesture: Fine-Grained Gesture Sensing and Recognition

Hand Gesture Recognition using Deep Feature Fusion Network based on Wearable Sensors

Device-Free Gesture Tracking Using Acoustic Signals

Ipand: Accurate Gesture Input with Ambient Acoustic Sensing on Hand.

Fusion of kinematic and physiological sensors for hand gesture recognition

Dynamic Hand Gesture-Featured Human Motor Adaptation in Tool Delivery using Voice Recognition

M-Gesture : Person-Independent Real-Time In-Air Gesture Recognition Using Commodity Millimeter Wave Radar