Integration of Convolutional Neural Network and Vision Transformer for Gesture Recognition Using Semg

Xiaoguang Liu,Lijian Hu,Liang Tie,Li Jun,Xiaodong Wang,Xiuling Liu
DOI: https://doi.org/10.1016/j.bspc.2024.106686
IF: 5.1
2024-01-01
Biomedical Signal Processing and Control
Abstract:Currently, gesture recognition primarily utilizes Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) among deep learning methods. However, the unique spatial and temporal features of surface Electromyography (sEMG) signals render these methods insufficient for effective feature extraction. To tackle this challenge, this paper proposes a novel model named CNN-VIT, which integrates the architecture of CNN and Vision Transformers (VIT) with a weighted mechanism. This innovative model combines the strengths of CNN and VIT to comprehensively exploit the motion information encoded in sEMG signals, subsequently improving the accuracy and reliability of gesture recognition. To validate the algorithm’s practical efficacy, we conducted experiments on the Ninapro DB2 Exercise B dataset, followed by tests on the DB-MYO dataset, which was collected using a myoelectric data bracelet. Additionally, we performed real-time prediction experiments to further assess the model’s performance. Results demonstrate a classification accuracy of 83.05%, 90.40%, and 85.00%, affirming the superior classification performance of CNN-VIT.
What problem does this paper attempt to address?