Hand Segmentation With Dense Dilated U-Net and Structurally Incoherent Nonnegative Matrix Factorization-Based Gesture Recognition

Kankana Roy,Rajiv R. Sahay
DOI: https://doi.org/10.1109/thms.2024.3390415
2024-05-25
IEEE Transactions on Human-Machine Systems
Abstract:Robust segmentation of hands in a cluttered environment for hand gesture recognition has remained a challenge in computer vision. In this work, a two-stage gesture recognition framework is proposed. In the first stage, we segment hands using the proposed deep learning algorithm, and in the second stage, we use these segmented hands to classify gestures using a novel structurally incoherent nonnegative matrix factorization approach. We propose a new deep learning framework for hand segmentation called densely dilated U-Net. We exploit recently proposed dense blocks and dilated convolution layers in our work. To cope with the scarcity of labeled datasets we extend our densely dilated U-Net for semisupervised hand segmentation using hand bounding boxes as cues. We provide quantitative and qualitative evaluation of proposed hand segmentation model on several public hand segmentation datasets including EgoHands, GTEA, EYTH, EDSH, and HOF. Semisupervised segmentation results are also obtained on two hand detection datasets including VIVA and CVRR. As an extension of our work, we show semisupervised segmentation and gesture recognition results using segmented hands on NUS-II cluttered hand gesture dataset. To validate the efficiency of our semisupervised algorithm we evaluate it on OUHands dataset with real ground truth labels. For gesture classification, we propose a novel structurally incoherent nonnegative matrix factorization algorithm. We propose to use CNN features extracted from segmented images for nonnegative matrix factorization. Experimental results on NUS-II and OUHands datasets demonstrate that our two-stage approach for gesture recognition yields superior results.
computer science, cybernetics, artificial intelligence
What problem does this paper attempt to address?