Real-Time Fitness Exercise Classification and Counting from Video Frames

Riccardo Riccio
2024-11-18
Abstract:This paper introduces a novel method for real-time exercise classification using a Bidirectional Long Short-Term Memory (BiLSTM) neural network. Existing exercise recognition approaches often rely on synthetic datasets, raw coordinate inputs sensitive to user and camera variations, and fail to fully exploit the temporal dependencies in exercise movements. These issues limit their generalizability and robustness in real-world conditions, where lighting, camera angles, and user body types vary. To address these challenges, we propose a BiLSTM-based model that leverages invariant features, such as joint angles, alongside raw coordinates. By using both angles and (x, y, z) coordinates, the model adapts to changes in perspective, user positioning, and body differences, improving generalization. Training on 30-frame sequences enables the BiLSTM to capture the temporal context of exercises and recognize patterns evolving over time. We compiled a dataset combining synthetic data from the InfiniteRep dataset and real-world videos from Kaggle and other sources. This dataset includes four common exercises: squat, push-up, shoulder press, and bicep curl. The model was trained and validated on these diverse datasets, achieving an accuracy of over 99% on the test set. To assess generalizability, the model was tested on 2 separate test sets representative of typical usage conditions. Comparisons with the previous approach from the literature are present in the result section showing that the proposed model is the best-performing one. The classifier is integrated into a web application providing real-time exercise classification and repetition counting without manual exercise selection. Demo and datasets are available at the following GitHub Repository: <a class="link-external link-https" href="https://github.com/RiccardoRiccio/Fitness-AI-Trainer-With-Automatic-Exercise-Recognition-and-Counting" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the limitations of existing motion recognition methods in practical applications, specifically including: 1. **Dependence on synthetic datasets**: Existing motion recognition methods usually rely on synthetic datasets, which are quite different from the real - world environment, resulting in insufficient generalization ability of the model. 2. **Sensitivity to user and camera changes**: Many methods use raw coordinate inputs (such as the (x, y, z) coordinates of joints), and these inputs are very sensitive to changes in user postures, camera angles and distances, thus limiting the robustness of the model under different conditions. 3. **Failure to fully utilize temporal dependencies**: Existing methods often fail to fully capture the temporal dependencies during the motion process, especially for those motions with similar initial postures but different action sequences, which leads to a low classification accuracy. To solve these problems, this paper proposes a new method based on Bidirectional Long - Short - Term Memory Network (BiLSTM). The main improvement points of this method include: - **Introducing invariant features**: In addition to using the raw coordinates, joint angles are also introduced as features. These angle features are invariant to changes in camera perspectives and user positions, thereby improving the robustness and generalization ability of the model. - **Capturing temporal dependencies**: By processing 30 - frame sequences, BiLSTM can capture the temporal context of motion and recognize patterns that evolve over time, so as to better distinguish different motions. - **Combining real and synthetic datasets**: To ensure the performance of the model under different conditions, the training dataset includes not only synthetic data, but also real - video data from platforms such as Kaggle, covering four common motions: squats, push - ups, shoulder presses and bicep curls. Finally, this model achieves an accuracy of over 99% on the test set and is integrated into a user - friendly Web application, which can automatically classify motions and count them in a real - time environment without the need for users to manually select the motion types.