Spatiotemporal features representation with dynamic mode decomposition for hand gesture recognition using deep neural networks

Bhavana Sharma,Jeebananda Panda
DOI: https://doi.org/10.1007/s11760-024-03038-y
IF: 1.583
2024-03-05
Signal Image and Video Processing
Abstract:Hand Gesture Recognition (HGR) with complexity and diversity of hand images in uncontrolled environment is a challenging task because of complex backgrounds, light illumination, strong occlusions, blur motion. This work provides a thorough examination of spatiotemporal feature extraction with deep learning model in order to overcome practical variations in lighting and fluctuations of physical hand's movement in both space and time. The hand skin color is first filtered through YCbCr color space and in order to train the hand images, MediaPipe is used to distinguish the specific gesture region. With respect to spatial variations, the spatiotemporal features extraction is done by Dynamic Mode Decomposition (DMD) technique, where hand key features are decoupled with time dynamics and modes in order to obtain time–frequency analysis. Thus, the received reconstructed signal has an enhanced visibility of skin-color pixels. The extensive experiment is demonstrated by deep neural network ResNet18 for better classification on three publicly available datasets, namely, Ego hand dataset, American Sign Language (ASL) dataset and Senz3D dataset. This work outplays existing state-of-arts methods remarkable regarding spatiotemporal features extraction with an accuracy of Ego hand dataset is 97.85% and ASL dataset is 98.49% at specific dynamic modes three, whereas Senz3D dataset achieves 98.51% classification accuracy at dynamic mode two. We have obtained a competitive outcome when comparing the State-Of-The-Art (SOTA) techniques available for HGR.
engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?