Eliciting Multimodal Gesture+Speech Interactions in a Multi-Object Augmented Reality Environment

Xiaoyan Zhou,Adam S. Williams,Francisco R. Ortega
DOI: https://doi.org/10.1145/3562939.3565637
2022-07-26
Abstract:As augmented reality technology and hardware become more mature and affordable, researchers have been exploring more intuitive and discoverable interaction techniques for immersive environments. In this paper, we investigate multimodal interaction for 3D object manipulation in a multi-object virtual environment. To identify the user-defined gestures, we conducted an elicitation study involving 24 participants for 22 referents with an augmented reality headset. It yielded 528 proposals and generated a winning gesture set with 25 gestures after binning and ranking all gesture proposals. We found that for the same task, the same gesture was preferred for both one and two object manipulation, although both hands were used in the two object scenario. We presented the gestures and speech results, and the differences compared to similar studies in a single object virtual environment. The study also explored the association between speech expressions and gesture stroke during object manipulation, which could improve the recognizer efficiency in augmented reality headsets.
Human-Computer Interaction
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to design an intuitive and easy - to - discover multimodal gesture + voice interaction technology in a multi - object augmented reality environment. Specifically, the research focuses on the following key issues: 1. **Performance of multimodal interaction in multi - object virtual environments**: The research explores whether multimodal interaction (especially the combination of gestures and voice) is different from the interaction in a single - object environment when there are multiple objects in the environment. 2. **Impact of multi - object environments on user - proposed gestures and voices**: The research analyzes whether the interaction gestures and voice commands proposed by users in a multi - object environment are significantly different from those in a single - object environment. 3. **Gestures preferred by users in multi - object manipulation**: The research investigates whether users are more inclined to use both hands when manipulating multiple objects, and how these gestures are different from those in single - object manipulation. 4. **Association between voice expressions and gesture actions**: The research explores the association between the voice expressions used by users and their gesture actions during object manipulation, which helps to improve the efficiency of the recognition system in augmented reality headsets. Through the exploration of these issues, the research aims to understand the characteristics of multimodal interaction in multi - object virtual environments and provide guidance for future designs.