Abstract:This paper introduces a novel method for real-time exercise classification using a Bidirectional Long Short-Term Memory (BiLSTM) neural network. Existing exercise recognition approaches often rely on synthetic datasets, raw coordinate inputs sensitive to user and camera variations, and fail to fully exploit the temporal dependencies in exercise movements. These issues limit their generalizability and robustness in real-world conditions, where lighting, camera angles, and user body types vary. To address these challenges, we propose a BiLSTM-based model that leverages invariant features, such as joint angles, alongside raw coordinates. By using both angles and (x, y, z) coordinates, the model adapts to changes in perspective, user positioning, and body differences, improving generalization. Training on 30-frame sequences enables the BiLSTM to capture the temporal context of exercises and recognize patterns evolving over time. We compiled a dataset combining synthetic data from the InfiniteRep dataset and real-world videos from Kaggle and other sources. This dataset includes four common exercises: squat, push-up, shoulder press, and bicep curl. The model was trained and validated on these diverse datasets, achieving an accuracy of over 99% on the test set. To assess generalizability, the model was tested on 2 separate test sets representative of typical usage conditions. Comparisons with the previous approach from the literature are present in the result section showing that the proposed model is the best-performing one. The classifier is integrated into a web application providing real-time exercise classification and repetition counting without manual exercise selection. Demo and datasets are available at the following GitHub Repository: <a class="link-external link-https" href="https://github.com/RiccardoRiccio/Fitness-AI-Trainer-With-Automatic-Exercise-Recognition-and-Counting" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are the limitations of existing motion recognition methods in practical applications, specifically including: 1. **Dependence on synthetic datasets**: Existing motion recognition methods usually rely on synthetic datasets, which are quite different from the real - world environment, resulting in insufficient generalization ability of the model. 2. **Sensitivity to user and camera changes**: Many methods use raw coordinate inputs (such as the (x, y, z) coordinates of joints), and these inputs are very sensitive to changes in user postures, camera angles and distances, thus limiting the robustness of the model under different conditions. 3. **Failure to fully utilize temporal dependencies**: Existing methods often fail to fully capture the temporal dependencies during the motion process, especially for those motions with similar initial postures but different action sequences, which leads to a low classification accuracy. To solve these problems, this paper proposes a new method based on Bidirectional Long - Short - Term Memory Network (BiLSTM). The main improvement points of this method include: - **Introducing invariant features**: In addition to using the raw coordinates, joint angles are also introduced as features. These angle features are invariant to changes in camera perspectives and user positions, thereby improving the robustness and generalization ability of the model. - **Capturing temporal dependencies**: By processing 30 - frame sequences, BiLSTM can capture the temporal context of motion and recognize patterns that evolve over time, so as to better distinguish different motions. - **Combining real and synthetic datasets**: To ensure the performance of the model under different conditions, the training dataset includes not only synthetic data, but also real - video data from platforms such as Kaggle, covering four common motions: squats, push - ups, shoulder presses and bicep curls. Finally, this model achieves an accuracy of over 99% on the test set and is integrated into a user - friendly Web application, which can automatically classify motions and count them in a real - time environment without the need for users to manually select the motion types.

Real-Time Fitness Exercise Classification and Counting from Video Frames

Fast and robust video-based exercise classification via body pose tracking and scalable multivariate time series classifiers

An Examination of Wearable Sensors and Video Data Capture for Human Exercise Classification

Realtime Indoor Workout Analysis Using Machine Learning & Computer Vision

Recognizing Exercises and Counting Repetitions in Real Time

ExerSense: Real-Tme Physical Exercise Segmentation, Classification, and Counting Algorithm Using an IMU Sensor

Workout Classification Using a Convolutional Neural Network in Ensemble Learning

Real-Time Sensor-Based Human Activity Recognition for eFitness and eHealth Platforms

Multimodal sensor fusion models for real-time exercise repetition counting with IMU sensors and respiration data

Rehabilitation Exercise Repetition Segmentation and Counting using Skeletal Body Joints

Fitness Done Right: a Real-time Intelligent Personal Trainer for Exercise Correction

Intelligent Repetition Counting for Unseen Exercises: A Few-Shot Learning Approach with Sensor Signals

Video-based Exercise Classification and Activated Muscle Group Prediction with Hybrid X3D-SlowFast Network

AI Coach: Deep Human Pose Estimation and Analysis for Personalized Athletic Training Assistance

Pūioio: On-device Real-Time Smartphone-Based Automated Exercise Repetition Counting System

Exploration of deep learning architectures for real-time yoga pose recognition

IoT-Based Wearable Sensors and Bidirectional LSTM Network for Action Recognition of Aerobics Athletes

Wearable Sensor Data for Classification and Analysis of Functional Fitness Exercises Using Unsupervised Deep Learning Methodologies

Deep Learning for Fitness

Physiotherapy Exercise Classification with Single-Camera Pose Detection and Machine Learning

DS-MS-TCN: Otago Exercises Recognition with a Dual-Scale Multi-Stage Temporal Convolutional Network