A multimodal gesture recognition dataset for desktop human-computer interaction

Qi Wang,Fengchao Zhu,Guangming Zhu,Liang Zhang,Ning Li,Eryang Gao
2024-01-08
Abstract:Gesture recognition is an indispensable component of natural and efficient human-computer interaction technology, particularly in desktop-level applications, where it can significantly enhance people's productivity. However, the current gesture recognition community lacks a suitable desktop-level (top-view perspective) dataset for lightweight gesture capture devices. In this study, we have established a dataset named GR4DHCI. What distinguishes this dataset is its inherent naturalness, intuitive characteristics, and diversity. Its primary purpose is to serve as a valuable resource for the development of desktop-level portable applications. GR4DHCI comprises over 7,000 gesture samples and a total of 382,447 frames for both Stereo IR and skeletal modalities. We also address the variances in hand positioning during desktop interactions by incorporating 27 different hand positions into the dataset. Building upon the GR4DHCI dataset, we conducted a series of experimental studies, the results of which demonstrate that the fine-grained classification blocks proposed in this paper can enhance the model's recognition accuracy. Our dataset and experimental findings presented in this paper are anticipated to propel advancements in desktop-level gesture recognition research.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper aims to address the lack of suitable datasets in the field of desktop-level gesture recognition. Current gesture recognition systems, especially in desktop applications, need improvement in efficiency and naturalness. The researchers have established a large-scale multimodal gesture recognition dataset named GR4DHCI, which consists of 7,339 dynamic gesture samples performed by 27 different hand poses, totaling 382,447 frames, divided into infrared and skeleton modes. The dataset emphasizes naturalness, intuitiveness, and diversity to accommodate long-term, fatigue-free use. By conducting experiments on the GR4DHCI dataset, the researchers propose a fine-grained classification block based on infrared images and skeleton motion, which improves the model's recognition accuracy. The experimental results show that this approach improves the recognition accuracy of both infrared and skeleton modes by 2.64% and 7.75% respectively. The paper also compares the GR4DHCI dataset with other existing gesture recognition datasets, pointing out that GR4DHCI is the first dataset designed specifically for desktop-level (overhead view) gesture recognition, covering a variety of hand poses and angle changes, increasing the diversity and authenticity of the data. In addition, the paper explores existing gesture recognition methods, including spatiotemporal networks and graph convolutional networks. Through experimental evaluation, the latest techniques on the GR4DHCI dataset, such as Res3D + ConvLSTM + MobileNet and TL-GCN, show improved performance when combined with the fine-grained classification block, demonstrating the effectiveness of the dataset and method. The paper expects GR4DHCI to contribute to the advancement of research in desktop-level gesture recognition.