Integration of a skier-specific keypoint detection model in a hybrid 3D motion capture pipeline
Michael Zwölfer,Martin Mössner,Helge Rhodin,Werner Nachbauer,Dieter Heinrich
DOI: https://doi.org/10.36950/2024.4ciss013
2024-09-23
Current Issues in Sport Science (CISS)
Abstract:Introduction & Purpose Alpine skiing, like many outdoor sports, presents significant challenges for motion capture due to its large capture volumes, high athlete speeds, variable environmental conditions, and occlusions, e.g., due to snow spray. While traditional marker-based motion capture systems offer highest precision in the lab, they are usually unsuitable for outdoor settings. Sensor-based methods, such as inertial measurement units, however, may suffer from inaccuracies due to sensor noise and drift, while they only provide relative segment positions (Fasel et al., 2018). Therefore, recent studies in alpine skiing preferably used video-based systems (Heinrich et al., 2023; Spörri, 2016). These methods rely on multi-camera setups that require synchronization and camera calibration. However, the extensive manual digitization required for both keypoints and reference points introduces a substantial workload in post-processing, particularly when cameras must pan, tilt, and zoom to cover large capture volumes (Spörri, 2016). Recent advancements in computer vision have unveiled great potential for human motion capture, especially in automating much of the manual work required for video-based systems (Fang et al., 2017; Redmon et al., 2016; Zwölfer et al., 2023a). We therefore developed a novel, hybrid 3D motion capture approach that automates the detection of reference points using a reference point detection algorithm and the digitization of keypoints using a skier-specific keypoint detection algorithm. This reduces the reliance on manual digitization and enhances the scalability and practicality of large-scale motion capture in outdoor environments. The aim of this study was to a) evaluate the performance of the skier-specific keypoint detection against manual digitization and b) determine the impact of the skier-specific keypoint detection on the overall performance of the hybrid 3D motion capture pipeline. Methods The experimental setup involved a multi-camera system comprising eight Sony AX53 cameras uniformly distributed around a capture volume on a ski slope in Zürs am Arlberg, AT. The capture volume measured approximately 250 x 80 x 20 meters. To calibrate the cameras, approximately 300 reference points were placed within this area and surveyed geodetically. Each reference point was equipped with a 9 x 9 cm cube that displayed Aruco markers on all sides, enabling their automated detection by our reference point detection algorithm. This arrangement allowed for continuous calibration of each camera in every frame, accommodating for panning, tilting, and zooming. Camera calibration and 3D reconstruction were performed using the Direct Linear Transformation (DLT) method (Abdel-Aziz & Karara, 1971). In total, ten state-certified Austrian ski instructors performed eight runs according to the progression levels of the Austrian ski curriculum (Österreichischer Skischulverband, 2018). To develop a keypoint detection model, capable of detecting a skier, including equipment, e.g., skis and poles, we finetuned AlphaPose’s HALPE26 model (Fang et al., 2017), which was designed to estimate 26 body keypoints for general motions, on a skier-specific dataset. Training was done for 200 epochs with a learning rate of 10-3 and data augmentation enabled. AlphaPose was chosen for its proven performance in alpine skiing scenarios (Zwölfer et al., 2023a). For the skier-specific dataset, we manually digitized six runs, marking 24 keypoint, including 18 body keypoints, ski tips, ski tails, and poles, in each image. The six runs were selected to include slow, medium and high-speed skiing of one male and one female subject. These digitized images complement the datasets built by Bachmann et al. (2019) and later by Zwölfer et al. (2023b) and Heinrich et al. (2023), all using the same set of keypoints. In total, our comprehensive skier-specific dataset comprised about 15,000 images, with approximately two-thirds of all images originating from the current measurement. The accuracy of our model was evaluated in 2D image space by calculating the mean per joint position error (MPJPE), percentage of correct keypoints (PCK), and mean average precision (mAP) metrics on a test set of about 2,000 images that were excluded from training. In addition, we determined the impact of the keypoint detection algorithm on the hybrid motion capture pipeline. We, therefore, processed one of the manually digitized runs by the skier-specific model as well as the HALPE26 model. Using the calibration matrices, we reconstructed the 3D motion of the skier for each method and calculated the mean lengths of eight body segments (upper arms, forearms, thighs, and shanks). We compared the measured physical lengths and the mean segment lengths reconstructed from manually digitized keypoints, keypoints processed by the HALPE26 model, and our skier-specific model. We also quantified the variation of segment lengths by calculating their mean standard deviation across all 250 frames of the run. Results on segment lengths (mean values and variations) were calculated without any smoothing or filtering. Results Our skier-specific keypoint detection model achieved a PCK of 98%, a mAP of 0.97, and a MPJPE of 10.32 pixels on the test set. Visual assessment of the detected keypoints supported these quantitative results, showing only a few flawed detections. Most inaccuracies involved ski tails or poles and were primarily due to occlusions. Representative images showcasing the model's performance are displayed in Figure 1 (left). The plausibility of the skier-specific model for 3D reconstruction is demonstrated in the 3D visualization of a sample run shown in Figure 1 (right). Differences between measured and reconstructed segment lengths (mean across all frames) ranged from a minimum of 0.2 cm to a maximum of 2.3 cm, with only small differences observed among the different keypoint detection methods investigated. The evaluation of the variations in segment lengths revealed a mean standard deviation of 4.6 cm for manually annotated frames, compared to 4.5 cm for frames processed by the HALPE26 model. The mean deviation for frames processed by our skier-specific model was reduced to 3.4 cm. Discussion Our results demonstrated that the accuracy and precision of our model are at least on par with manual digitization, as evidenced both visually and through quantitative evaluation. On the 2D de tection level, the PCK, MPJPE, and mAP metrics reflected the model’s high performance, aligning with previous studies (Bachmann et al., 2019, Zwölfer et al., 2023a), who reported comparable MPJPE values when applying keypoint detection to regular skiing scenarios. On the 3D reconstruction level, differences between measured and reconstructed segment lengths for all three keypoint detection methods were within the range of typical measurement errors of about 2 cm. However, on this single run, our new model outperformed manual digitization and plain AlphaPose detections in terms of segment length variation. This was especially surprising, as our 3D reconstruction did not enforce temporal smoothness or kinematic constraints such as segment length consistency. This suggests that our model may benefit synergistically from both the pretrained data and manual annotations, possibly averaging out the low precision in manual digitization. While these variations in segment lengths may appear large, no smoothing or filtering was applied. For biomechanical analysis, results can be significantly improved by smoothing the data, e.g., using splines. By integrating our skier-specific keypoint detection model to our hybrid motion capture pipeline, we reduced the manual work required for digitizing all frames in all perspectives from dozens of hours for a single run to just a few minutes of computing time. It is essential to mention that this evaluation was limited to a single run. Moreover, the accuracy of the 3D data heavily relies on accurate camera calibration and temporal synchronization. Therefore, ablation studies to evaluate the influence of the reference point detection algorithm and the temporal synchronization method will be realized in a future study. Conclusion We implemented a skier-specific keypoint detection model capable of detecting a skier, including skis and poles, which showed good performance in both 2D image space and 3D reconstruction. By eliminating the manual digitization workload, the hybrid 3D motion capture pipeline facilitates large-scale motion capture in similar outdoor settings and enhances the scalability of biomechanical research in outdoor sports like alpine skiing. Additionally, this method allows us to automatically digitize the remaining runs recorded during this field study, resulting in an extensive 3D dataset crucial for the future development of fully computer vision-based motion capture methods. References Abdel-Aziz, Y. I., & Karara, H. M. (1971). Direct linear transformation from comparator coordinates into object space coordinates in close-range photogrammetry. Photogrammetric Engineering and Remote Sensing, 38(1), 49-55. Bachmann, R., Spörri, J., Fua, P., & Rhodin, H. (2019). Motion capture from pan-tilt cameras with unknown orientation. arXiv, 1908.11676. https://doi.org/10.48550/arXiv.1908.11676 Fang, H. S., Xie, S., Tai, Y. W., & Lu, C. (2017). RMPE: Regional multi-person pose estimation. 2017 IEEE International Conference on Computer Vision (ICCV), 2353-2362. https://doi.ieeecomputersociety.org/10.1109/ICCV.2017.256 Fasel, B., Spörri, J., Chardonnens, J., Kröll, J., Müller, E., & Aminian, K. (2018). Joint inertial sensor orientation drift reduction for highly dynamic movements. IEEE Journal of Biomedical and Health Informatics, 22(1), 77-86. https://doi.org/10.1109/JBHI.2017.2659758 Heinrich, D., van den Bogert, A., Mössner, M., & Nachbauer, W. (2023). Model-based estimation of muscle and ACL forces during turning maneuvers in alpine skiing. Scientific Reports, 13, Article 9026. https://doi.org/10.1038/s41598-023-35775-4 Österreichischer Skischulverband. (2018). Snowsport Austria - Die Österreichische Skischule - Vom Einstieg zur Perfektion. In vier Stufen zum Erfolg (2nd ed.) [Snowsport Austria - The Austrian Ski School - From entry to perfection. Four steps to success]. Brüder Hollinek. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779-788. https://doi.ieeecomputersociety.org/10.1109/CVPR.2016.91 Spörri, J. (2016). Research dedicated to sports injury prevention – the ‘sequence of prevention’ on the example of alpine ski racing. University of Salzburg, Austria. http://dx.doi.org/10.13140/RG.2.2.28451.89126 Zwölfer, M., Heinrich, D., Wandt, B., Rhodin, H., Spörri, J., & Nachbauer, W. (2023a). Deep learning-based 2D keypoint detection in alpine skiing – A performance analysis of state-of-the-art algorithms applied to regular skiing and injury situations. JSAMS Plus, 2, Article 100034. https://doi.org/10.1016/j.jsampl.2023.100034 Zwölfer, M., Heinrich, D., Wandt, B., Rhodin, H., Spörri, J., & Nachbauer, W. (2023b). A graph-based approach can improve keypoint detection of complex poses: a proof-of-concept on injury occurrences in alpine ski racing. Scientific Reports, 13, Article 21465. https://doi.org/10.1038/s41598-023-47875-2
What problem does this paper attempt to address?
-
Motion Parameters Measurement of User-Defined Key Points Using 3D Pose Estimation
Xin Wu,Yonghui Wang,Lei Chen,Lin Zhang,Lianming Wang
DOI: https://doi.org/10.1016/j.engappai.2022.104667
IF: 8
2022-01-01
Engineering Applications of Artificial Intelligence
Abstract:Motion parameters measurement is essential for understanding animal behavior, exploring the laws of object motion, and studying control methods. Nowadays, advanced computer vision based on machine learning technology supports markerless object tracking in 2D videos. However, due to the fact that all objects move in three-dimensional space, this paper introduces a method of measuring motion parameters using 3D pose estimation. First, an enhanced iterative bundle adjustment algorithm is proposed for multi-camera calibration in a multi-camera vision system by adding two control parameters, which dramatically reduces the reprojection error of multi-camera calibration and lays the foundation for high-precision triangulation. Then, a new spatiotemporal loss function is proposed, which considers the relationship between key points that do not constitute limbs, thereby improving triangulation accuracy. The new multi-camera calibration algorithm is evaluated on ChArUco and 3D pose estimation for metronome, planet pendulum, human hand, Koi, and cheetah. The experimental results show that: (1) the two hyper-parameters in the enhanced iterative bundle adjustment algorithm effectively suppress the influence of noise and play a good role in reducing the reprojection error of multi-camera calibration; (2) the spatiotemporal loss function has a strong constraining ability, the time loss can stabilize high frame rate video triangulation to maintain accuracy, while the space loss can improve the accuracy of triangulation for more complex structures; (3) multi-view data fusion is also conducive to improving the accuracy of triangulation. Moreover, the method was successfully applied to some actual measurement scenes: (1) the accurate measurement of the frequency of a metronome; and (2) the success measurement of the movement of a Koi, which conforms to the basic model of fish swimming. Some dynamic measurement results are displayed at https://github.com/wux024/AdamPose .
-
A graph-based approach can improve keypoint detection of complex poses: a proof-of-concept on injury occurrences in alpine ski racing
Michael Zwölfer,Dieter Heinrich,Bastian Wandt,Helge Rhodin,Jörg Spörri,Werner Nachbauer
DOI: https://doi.org/10.1038/s41598-023-47875-2
2023-12-05
Abstract:For most applications, 2D keypoint detection works well and offers a simple and fast tool to analyse human movements. However, there remain many situations where even the best state-of-the-art algorithms reach their limits and fail to detect human keypoints correctly. Such situations may occur especially when individual body parts are occluded, twisted, or when the whole person is flipped. Especially when analysing injuries in alpine ski racing, such twisted and rotated body positions occur frequently. To improve the detection of keypoints for this application, we developed a novel method that refines keypoint estimates by rotating the input videos. We select the best rotation for every frame with a graph-based global solver. Thereby, we improve keypoint detection of an arbitrary pose estimation algorithm, in particular for 'hard' keypoints. In the current proof-of-concept study, we show that our approach outperforms standard keypoint detection results in all categories and in all metrics, in injury-related out-of-balance and fall situations by a large margin as well as previous methods, in performance and robustness. The Injury Ski II dataset was made publicly available, aiming to facilitate the investigation of sports accidents based on computer vision in the future.
-
Motion Capture from Pan-Tilt Cameras with Unknown Orientation
Roman Bachmann,Jörg Spörri,Pascal Fua,Helge Rhodin
DOI: https://doi.org/10.48550/arXiv.1908.11676
2019-08-30
Abstract:In sports, such as alpine skiing, coaches would like to know the speed and various biomechanical variables of their athletes and competitors. Existing methods use either body-worn sensors, which are cumbersome to setup, or manual image annotation, which is time consuming. We propose a method for estimating an athlete's global 3D position and articulated pose using multiple cameras. By contrast to classical markerless motion capture solutions, we allow cameras to rotate freely so that large capture volumes can be covered. In a first step, tight crops around the skier are predicted and fed to a 2D pose estimator network. The 3D pose is then reconstructed using a bundle adjustment method. Key to our solution is the rotation estimation of Pan-Tilt cameras in a joint optimization with the athlete pose and conditioning on relative background motion computed with feature tracking. Furthermore, we created a new alpine skiing dataset and annotated it with 2D pose labels, to overcome shortcomings of existing ones. Our method estimates accurate global 3D poses from images only and provides coaches with an automatic and fast tool for measuring and improving an athlete's performance.
Computer Vision and Pattern Recognition
-
Estimation of alpine skier posture using machine learning techniques
Bojan Nemec,Tadej Petrič,Jan Babič,Matej Supej
DOI: https://doi.org/10.3390/s141018898
2014-10-13
Abstract:High precision Global Navigation Satellite System (GNSS) measurements are becoming more and more popular in alpine skiing due to the relatively undemanding setup and excellent performance. However, GNSS provides only single-point measurements that are defined with the antenna placed typically behind the skier's neck. A key issue is how to estimate other more relevant parameters of the skier's body, like the center of mass (COM) and ski trajectories. Previously, these parameters were estimated by modeling the skier's body with an inverted-pendulum model that oversimplified the skier's body. In this study, we propose two machine learning methods that overcome this shortcoming and estimate COM and skis trajectories based on a more faithful approximation of the skier's body with nine degrees-of-freedom. The first method utilizes a well-established approach of artificial neural networks, while the second method is based on a state-of-the-art statistical generalization method. Both methods were evaluated using the reference measurements obtained on a typical giant slalom course and compared with the inverted-pendulum method. Our results outperform the results of commonly used inverted-pendulum methods and demonstrate the applicability of machine learning techniques in biomechanical measurements of alpine skiing.
-
Detecting Arbitrary Keypoints on Limbs and Skis with Sparse Partly Correct Segmentation Masks
Katja Ludwig,Daniel Kienzle,Julian Lorenz,Rainer Lienhart
DOI: https://doi.org/10.48550/arXiv.2211.09446
2022-11-17
Computer Vision and Pattern Recognition
Abstract:Analyses based on the body posture are crucial for top-class athletes in many sports disciplines. If at all, coaches label only the most important keypoints, since manual annotations are very costly. This paper proposes a method to detect arbitrary keypoints on the limbs and skis of professional ski jumpers that requires a few, only partly correct segmentation masks during training. Our model is based on the Vision Transformer architecture with a special design for the input tokens to query for the desired keypoints. Since we use segmentation masks only to generate ground truth labels for the freely selectable keypoints, partly correct segmentation masks are sufficient for our training procedure. Hence, there is no need for costly hand-annotated segmentation masks. We analyze different training techniques for freely selected and standard keypoints, including pseudo labels, and show in our experiments that only a few partly correct segmentation masks are sufficient for learning to detect arbitrary keypoints on limbs and skis.
-
High-precision Human Body Acquisition Via Multi-View Binocular Stereopsis
Qing Ran,Kaimao Zhou,Yong-Liang Yang,Junpeng Kang,Linan Zhu,Yizhi Tang,Jieqing Feng
DOI: https://doi.org/10.1016/j.cag.2020.01.003
IF: 1.821
2020-01-01
Computers & Graphics
Abstract:It remains challenging how to acquire a human body shape with high precision and evaluate the reconstructed models effectively, because the results can be easily affected by various factors (e.g., the performance of the capture device, the unwanted movement of the subject, and the self-occlusion of the articulated body structure). To tackle the above challenges, this research presents a passive acquisition system, which comprises 60 spatially-configured Digital Single Lens Reflex (DSLR) cameras and a carefully devised algorithmic pipeline for shape acquisition in a single shot. Different from traditional multi-view stereo solutions, the constituent cameras are synchronized and organized into 30 binocular stereo rigs to capture images from multiple views simultaneously. Each binocular stereo rig is regarded as a depth sensor. The acquisition pipeline consists of three stages. First, camera calibration is performed to estimate intrinsic and extrinsic parameters of all cameras, especially for paired binocular cameras. Second, depth inference based on stereo matching is employed to recover reliable depth information from RGB images. A novel hierarchical seed-propagation stereo matching framework is proposed, resulting in 30 dense and uniform-distributed partial point clouds. Finally, a point-based geometry processing step composed of multi-view registration and surface meshing is carried out to obtain high-quality watertight human body shapes. This research also proposes an elaborate and novel method to assess the accuracy of reconstructed non-rigid human body model based on anthropometry parameters, which solves the synchronization of the ground-truth values and the measured values. Experimental results show that the system can achieve the reconstruction accuracy within 2.5 mm in average. (C) 2020 Elsevier Ltd. All rights reserved.
-
The accuracy of markerless motion capture combined with computer vision techniques for measuring running kinematics
Bas Van Hooren,Noah Pecasse,Kenneth Meijer,Johannes Maria Nicolaas Essers
DOI: https://doi.org/10.1111/sms.14319
2023-01-22
Scandinavian Journal of Medicine and Science in Sports
Abstract:Background Markerless motion capture based on low‐cost 2‐D video analysis in combination with computer vision techniques has the potential to provide accurate analysis of running technique in both a research and clinical setting. However, the accuracy of markerless motion capture for assessing running kinematics compared to a gold‐standard approach remains largely unexplored. Objective Here we investigate the accuracy of custom‐trained (DeepLabCut) and existing (OpenPose) computer vision techniques for assessing sagittal‐plane hip, knee, and ankle running kinematics at speeds of 2.78 and 3.33 m∙s‐1 as compared to gold‐standard marker‐based motion capture. Methods Differences between the markerless and marker‐based approaches were assessed using statistical parameter mapping and expressed as root mean squared errors (RMSEs). Results After temporal alignment and offset removal, both DeepLabCut and OpenPose showed no significant differences with the marker‐based approach at 2.78 m∙s‐1, but some significant differences remained at 3.33 m∙s‐1. At 2.78 m∙s‐1, RMSEs were 5.07, 7.91, and 5.60, and 5.92, 7.81, and 5.66 degrees for the hip, knee, and ankle for DeepLabCut and OpenPose, respectively. At 3.33 m∙s‐1, RMSEs were 7.40, 10.9, 8.01, and 4.95, 7.45, and 5.76 for the hip, knee, and ankle for DeepLabCut and OpenPose, respectively. Conclusion The differences between OpenPose and the marker‐based method were in line with or smaller than reported between other kinematic analysis methods and marker‐based methods, while these differences were larger for DeepLabCut. Since the accuracy differed between individuals, OpenPose may be most useful to facilitate large‐scale in‐field data collection and investigation of group effects rather than individual‐level analyses.
sport sciences
-
Improved 2D Keypoint Detection in Out-of-Balance and Fall Situations -- combining input rotations and a kinematic model
Michael Zwölfer,Dieter Heinrich,Kurt Schindelwig,Bastian Wandt,Helge Rhodin,Joerg Spoerri,Werner Nachbauer
DOI: https://doi.org/10.48550/arXiv.2112.12193
2021-12-23
Abstract:Injury analysis may be one of the most beneficial applications of deep learning based human pose estimation. To facilitate further research on this topic, we provide an injury specific 2D dataset for alpine skiing, covering in total 533 images. We further propose a post processing routine, that combines rotational information with a simple kinematic model. We could improve detection results in fall situations by up to 21% regarding the PCK@0.2 metric.
Computer Vision and Pattern Recognition
-
All Keypoints You Need: Detecting Arbitrary Keypoints on the Body of Triple, High, and Long Jump Athletes
Katja Ludwig,Julian Lorenz,Robin Schön,Rainer Lienhart
2023-05-10
Abstract:Performance analyses based on videos are commonly used by coaches of athletes in various sports disciplines. In individual sports, these analyses mainly comprise the body posture. This paper focuses on the disciplines of triple, high, and long jump, which require fine-grained locations of the athlete's body. Typical human pose estimation datasets provide only a very limited set of keypoints, which is not sufficient in this case. Therefore, we propose a method to detect arbitrary keypoints on the whole body of the athlete by leveraging the limited set of annotated keypoints and auto-generated segmentation masks of body parts. Evaluations show that our model is capable of detecting keypoints on the head, torso, hands, feet, arms, and legs, including also bent elbows and knees. We analyze and compare different techniques to encode desired keypoints as the model's input and their embedding for the Transformer backbone.
Computer Vision and Pattern Recognition
-
Monocular 3D Human Pose Estimation for Sports Broadcasts using Partial Sports Field Registration
Tobias Baumgartner,Stefanie Klatt
2023-04-10
Abstract:The filming of sporting events projects and flattens the movement of athletes in the world onto a 2D broadcast image. The pixel locations of joints in these images can be detected with high validity. Recovering the actual 3D movement of the limbs (kinematics) of the athletes requires lifting these 2D pixel locations back into a third dimension, implying a certain scene geometry. The well-known line markings of sports fields allow for the calibration of the camera and for determining the actual geometry of the scene. Close-up shots of athletes are required to extract detailed kinematics, which in turn obfuscates the pertinent field markers for camera calibration. We suggest partial sports field registration, which determines a set of scene-consistent camera calibrations up to a single degree of freedom. Through joint optimization of 3D pose estimation and camera calibration, we demonstrate the successful extraction of 3D running kinematics on a 400m track. In this work, we combine advances in 2D human pose estimation and camera calibration via partial sports field registration to demonstrate an avenue for collecting valid large-scale kinematic datasets. We generate a synthetic dataset of more than 10k images in Unreal Engine 5 with different viewpoints, running styles, and body types, to show the limitations of existing monocular 3D HPE methods. Synthetic data and code are available at <a class="link-external link-https" href="https://github.com/tobibaum/PartialSportsFieldReg_3DHPE" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
-
Feasibility of OpenPose markerless motion analysis in a real athletics competition
Neil J Cronin,Josh Walker,Catherine B Tucker,Gareth Nicholson,Mark Cooke,Stéphane Merlino,Athanassios Bissas
DOI: https://doi.org/10.3389/fspor.2023.1298003
2024-01-05
Abstract:This study tested the performance of OpenPose on footage collected by two cameras at 200 Hz from a real-life competitive setting by comparing it with manually analyzed data in SIMI motion. The same take-off recording from the men's Long Jump finals at the 2017 World Athletics Championships was used for both approaches (markerless and manual) to reconstruct the 3D coordinates from each of the camera's 2D coordinates. Joint angle and Centre of Mass (COM) variables during the final step and take-off phase of the jump were determined. Coefficients of Multiple Determinations (CMD) for joint angle waveforms showed large variation between athletes with the knee angle values typically being higher (take-off leg: 0.727 ± 0.242; swing leg: 0.729 ± 0.190) than those for hip (take-off leg: 0.388 ± 0.193; swing leg: 0.370 ± 0.227) and ankle angle (take-off leg: 0.247 ± 0.172; swing leg: 0.155 ± 0.228). COM data also showed considerable variation between athletes and parameters, with position (0.600 ± 0.322) and projection angle (0.658 ± 0.273) waveforms generally showing better agreement than COM velocity (0.217 ± 0.241). Agreement for discrete data was generally poor with high random error for joint kinematics and COM parameters at take-off and an average ICC across variables of 0.17. The poor agreement statistics and a range of unrealistic values returned by the pose estimation underline that OpenPose is not suitable for in-competition performance analysis in events such as the long jump, something that manual analysis still achieves with high levels of accuracy and reliability.
-
Exercise quantification from single camera view markerless 3D pose estimation
Clara Mercadal-Baudart,Chao-Jung Liu,Garreth Farrell,Molly Boyne,Jorge González Escribano,Aljosa Smolic,Ciaran Simms
DOI: https://doi.org/10.1016/j.heliyon.2024.e27596
IF: 3.776
2024-03-12
Heliyon
Abstract:Sports physiotherapists and coaches are tasked with evaluating the movement quality of athletes across the spectrum of ability and experience. However, the accuracy of visual observation is low and existing technology outside of expensive lab-based solutions has limited adoption, leading to an unmet need for an efficient and accurate means to measure static and dynamic joint angles during movement, converted to movement metrics useable by practitioners. This paper proposes a set of pose landmarks for computing frequently used joint angles as metrics of interest to sports physiotherapists and coaches in assessing common strength-building human exercise movements. It then proposes a set of rules for computing these metrics for a range of common exercises (single and double drop jumps and counter-movement jumps, deadlifts and various squats) from anatomical key-points detected using video, and evaluates the accuracy of these using a published 3D human pose model trained with ground truth data derived from VICON motion capture of common rehabilitation exercises. Results show a set of mathematically defined metrics which are derived from the chosen pose landmarks, and which are sufficient to compute the metrics for each of the exercises under consideration. Comparison to ground truth data showed that root mean square angle errors were within 10° for all exercises for the following metrics: shin angle, knee varus/valgus and left/right flexion, hip flexion and pelvic tilt, trunk angle, spinal flexion lower/upper/mid and rib flare. Larger errors (though still all within 15°) were observed for shoulder flexion and ASIS asymmetry in some exercises, notably front squats and drop-jumps. In conclusion, the contribution of this paper is that a set of sufficient key-points and associated metrics for exercise assessment from 3D human pose have been uniquely defined. Further, we found generally very good accuracy of the Strided Transformer 3D pose model in predicting these metrics for the chosen set of exercises from a single mobile device camera, when trained on a suitable set of functional exercises recorded using a VICON motion capture system. Future assessment of generalization is needed.
-
Tracking Skiers from the Top to the Bottom
Matteo Dunnhofer,Luca Sordi,Niki Martinel,Christian Micheloni
DOI: https://doi.org/10.1109/WACV57701.2024.00832
2023-12-15
Abstract:Skiing is a popular winter sport discipline with a long history of competitive events. In this domain, computer vision has the potential to enhance the understanding of athletes' performance, but its application lags behind other sports due to limited studies and datasets. This paper makes a step forward in filling such gaps. A thorough investigation is performed on the task of skier tracking in a video capturing his/her complete performance. Obtaining continuous and accurate skier localization is preemptive for further higher-level performance analyses. To enable the study, the largest and most annotated dataset for computer vision in skiing, SkiTB, is introduced. Several visual object tracking algorithms, including both established methodologies and a newly introduced skier-optimized baseline algorithm, are tested using the dataset. The results provide valuable insights into the applicability of different tracking methods for vision-based skiing analysis. SkiTB, code, and results are available at <a class="link-external link-https" href="https://machinelearning.uniud.it/datasets/skitb" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
-
Biodynamic Analysis of Alpine Skiing with a Skier-Ski-Snow Interaction Model
Nan Gao,Huitong Jin,Jianqiao Guo,Gexue Ren,Chun Yang
2024-11-08
Abstract:This study establishes a skier-ski-snow interaction (SSSI) model that integrates a 3D full-body musculoskeletal model, a flexible ski model, a ski-snow contact model, and an air resistance model. An experimental method is developed to collect kinematic and kinetic data using IMUs, GPS, and plantar pressure measurement insoles, which are cost-effective and capable of capturing motion in large-scale field conditions. The ski-snow interaction parameters are optimized for dynamic alignment with snow conditions and individual turning techniques. Forward-inverse dynamics simulation is performed using only the skier's posture as model input and leaving the translational degrees of freedom (DOFs) between the pelvis and the ground unconstrained. The effectiveness of our model is further verified by comparing the simulated results with the collected GPS and plantar pressure data. The correlation coefficient between the simulated ski-snow contact force and the measured plantar pressure data is 0.964, and the error between the predicted motion trajectory and GPS data is 0.7%. By extracting kinematic and kinetic parameters from skiers of different skill levels, quantitative performance analysis helps quantify ski training. The SSSI model with the parameter optimization algorithm of the ski-snow interaction allows for the description of skiing characteristics across varied snow conditions and different turning techniques, such as carving and skidding. Our research advances the understanding of alpine skiing dynamics, informing the development of training programs and facility designs to enhance athlete performance and safety.
Physics and Society,Emerging Technologies,Computational Physics
-
Extracting spatial knowledge from track and field broadcasts for monocular 3D human pose estimation
Tobias Baumgartner,Benjamin Paassen,Stefanie Klatt
DOI: https://doi.org/10.1038/s41598-023-41142-0
IF: 4.6
2023-08-29
Scientific Reports
Abstract:Collecting large datasets for investigations into human locomotion is an expensive and labor-intensive process. Methods for 3D human pose estimation in the wild are becoming increasingly accurate and could soon be sufficient to assist with the collection of datasets for analysis into running kinematics from TV broadcast data. In the domain of biomechanical research, small differences in 3D angles play an important role. More precisely, the error margins of the data collection process need to be smaller than the expected variation between athletes. In this work, we propose a method to infer the global geometry of track and field stadium recordings using lane demarcations. By projecting estimated 3D skeletons back into the image using this global geometry, we show that current state-of-the-art 3D human pose estimation methods are not (yet) accurate enough to be used in kinematics research.
multidisciplinary sciences
-
Estimating 3D kinematics and kinetics from virtual inertial sensor data through musculoskeletal movement simulations
Marlies Nitschke,Eva Dorschky,Sigrid Leyendecker,Bjoern M. Eskofier,Anne D. Koelewijn
DOI: https://doi.org/10.3389/fbioe.2024.1285845
IF: 5.7
2024-04-03
Frontiers in Bioengineering and Biotechnology
Abstract:Portable measurement systems using inertial sensors enable motion capture outside the lab, facilitating longitudinal and large-scale studies in natural environments. However, estimating 3D kinematics and kinetics from inertial data for a comprehensive biomechanical movement analysis is still challenging. Machine learning models or stepwise approaches performing Kalman filtering, inverse kinematics, and inverse dynamics can lead to inconsistencies between kinematics and kinetics. We investigated the reconstruction of 3D kinematics and kinetics of arbitrary running motions from inertial sensor data using optimal control simulations of full-body musculoskeletal models. To evaluate the feasibility of the proposed method, we used marker tracking simulations created from optical motion capture data as a reference and for computing virtual inertial data such that the desired solution was known exactly. We generated the inertial tracking simulations by formulating optimal control problems that tracked virtual acceleration and angular velocity while minimizing effort without requiring a task constraint or an initial state. To evaluate the proposed approach, we reconstructed three trials each of straight running, curved running, and a v-cut of 10 participants. We compared the estimated inertial signals and biomechanical variables of the marker and inertial tracking simulations. The inertial data was tracked closely, resulting in low mean root mean squared deviations for pelvis translation (≤20.2 mm), angles (≤1.8 deg), ground reaction forces (≤1.1 BW%), joint moments (≤0.1 BWBH%), and muscle forces (≤5.4 BW%) and high mean coefficients of multiple correlation for all biomechanical variables (≥0.99) . Accordingly, our results showed that optimal control simulations tracking 3D inertial data could reconstruct the kinematics and kinetics of individual trials of all running motions. The simulations led to mutually and dynamically consistent kinematics and kinetics, which allows researching causal chains, for example, to analyze anterior cruciate ligament injury prevention. Our work proved the feasibility of the approach using virtual inertial data. When using the approach in the future with measured data, the sensor location and alignment on the segment must be estimated, and soft-tissue artifacts are potential error sources. Nevertheless, we demonstrated that optimal control simulation tracking inertial data is highly promising for estimating 3D kinematics and kinetics for a comprehensive biomechanical analysis.
multidisciplinary sciences
-
Validation of Inertial-Measurement-Unit-Based Ex Vivo Knee Kinematics during a Loaded Squat before and after Reference-Frame-Orientation Optimisation
Svenja Sagasser,Adrian Sauer,Christoph Thorwächter,Jana G. Weber,Allan Maas,Matthias Woiczinski,Thomas M. Grupp,Ariana Ortigas-Vásquez
DOI: https://doi.org/10.3390/s24113324
IF: 3.9
2024-05-24
Sensors
Abstract:Recently, inertial measurement units have been gaining popularity as a potential alternative to optical motion capture systems in the analysis of joint kinematics. In a previous study, the accuracy of knee joint angles calculated from inertial data and an extended Kalman filter and smoother algorithm was tested using ground truth data originating from a joint simulator guided by fluoroscopy-based signals. Although high levels of accuracy were achieved, the experimental setup leveraged multiple iterations of the same movement pattern and an absence of soft tissue artefacts. Here, the algorithm is tested against an optical marker-based system in a more challenging setting, with single iterations of a loaded squat cycle simulated on seven cadaveric specimens on a force-controlled knee rig. Prior to the optimisation of local coordinate systems using the REference FRame Alignment MEthod (REFRAME) to account for the effect of differences in local reference frame orientation, root-mean-square errors between the kinematic signals of the inertial and optical systems were as high as 3.8° ± 3.5° for flexion/extension, 20.4° ± 10.0° for abduction/adduction and 8.6° ± 5.7° for external/internal rotation. After REFRAME implementation, however, average root-mean-square errors decreased to 0.9° ± 0.4° and to 1.5° ± 0.7° for abduction/adduction and for external/internal rotation, respectively, with a slight increase to 4.2° ± 3.6° for flexion/extension. While these results demonstrate promising potential in the approach's ability to estimate knee joint angles during a single loaded squat cycle, they highlight the limiting effects that a reduced number of iterations and the lack of a reliable consistent reference pose inflicts on the sensor fusion algorithm's performance. They similarly stress the importance of adapting underlying assumptions and correctly tuning filter parameters to ensure satisfactory performance. More importantly, our findings emphasise the notable impact that properly aligning reference-frame orientations before comparing joint kinematics can have on results and the conclusions derived from them.
engineering, electrical & electronic,chemistry, analytical,instruments & instrumentation
-
Automatic detection of skate strokes in short-track speed skating using one single IMU: validation of a new method
J. Clément,F. Croteau,M. Gagnon,S. Cros
DOI: https://doi.org/10.1080/14763141.2024.2331174
IF: 2.896
2024-04-12
Sports Biomechanics
Abstract:Greater impulse is a key performance indicator of success in short track speed skating. The main objective of this study was to develop a method to measure skating strokes using a single IMU. Eight elite or world-class speed skaters had one IMU placed against their skin on the lower back, and a camera setup was positioned to capture the test. A maximal speed trial was then executed by each participant, and the data were analysed to estimate agreement between the camera and IMU estimates of skate stroke events. Inter-evaluator reliability was assessed on a dataset of 22 athletes performing speed trials as well. The algorithm detected 100% of the strokes identified on the video capture system with a root mean square error of 0.06s. Bland-Altman analysis showed a bias of 0.03s between the two methods, which corresponds to the frame rate of the camera. The inter-evaluator reliability yielded an intra-class correlation of 1.00 (ICC3,1) from a dataset of 7089 strokes. This study provides an example of on-ice evaluation of speed skating strokes using a single IMU. This equipment is less expensive than that employed by previous authors and can be implemented in training situations with low invasiveness.
engineering, biomedical,sport sciences
-
Leveraging Anthropometric Measurements to Improve Human Mesh Estimation and Ensure Consistent Body Shapes
Katja Ludwig,Julian Lorenz,Daniel Kienzle,Tuan Bui,Rainer Lienhart
2024-09-27
Abstract:The basic body shape of a person does not change within a single video. However, most SOTA human mesh estimation (HME) models output a slightly different body shape for each video frame, which results in inconsistent body shapes for the same person. In contrast, we leverage anthropometric measurements like tailors are already obtaining from humans for centuries. We create a model called A2B that converts such anthropometric measurements to body shape parameters of human mesh models. Moreover, we find that finetuned SOTA 3D human pose estimation (HPE) models outperform HME models regarding the precision of the estimated keypoints. We show that applying inverse kinematics (IK) to the results of such a 3D HPE model and combining the resulting body pose with the A2B body shape leads to superior and consistent human meshes for challenging datasets like ASPset or fit3D, where we can lower the MPJPE by over 30 mm compared to SOTA HME models. Further, replacing HME models estimates of the body shape parameters with A2B model results not only increases the performance of these HME models, but also leads to consistent body shapes.
Computer Vision and Pattern Recognition
-
Real-time Full Body Capture with Inter-part Correlations – Supplemental Document –
Yuxiao Zhou,Marc Habermann,Ikhsanul Habibie,Ayush Tewari,Christian Theobalt,Feng Xu
2021-01-01
Abstract:In Fig. 1, we present more qualitative results on in-thewild videos. To process the image sequence, we first use the off-the-shell human detector [8] to obtain the body bounding box of the first frame. After that, for each frame, its body bounding box is updated according to the 2D keypoint estimation of the previous frame. In this way, our method tracks the subject and performs 3D capture fully automatically. As a frame-based approach, our method inevitably suffers from the temporal jittering, which is also shared by the previous work of Choutas et al. [2]. We adopt a basic temporal filter [1] for smooth visualization. Further, we compare our results with the state-of-the-art approaches of Choutas et al. [2] and Xiang et al. [10] in Fig. 2, where we present results of equal visual quality but much faster inference speed. We present failure cases in Fig. 3. In the first row, our method cannot handle the handhand interaction very well. This is because distinguishing the two hands from monocular color input is a very challenging task, and such samples are rare in our training data. In the second row, our approach does not estimate the face color and the hand pose very well due to the unseen appearance: the face is occluded by the goggles, while the hands are under the gloves. Finally, to illustrate the discrepancy in keypoint definitions of different datasets, we present the result of our model on the same image under different sets of dataset-specific extended keypoints in Fig. 4. The positions for the hips, shoulders, and neck are quite different, while the elbows, ankles, knees are always consistent across datasets. Please refer to our supplementary video for more results.