Abstract:Objective: Marker-based motion capture, considered the gold standard in human motion analysis, is expensive and requires trained personnel. Advances in inertial sensing and computer vision offer new opportunities to obtain research-grade assessments in clinics and natural environments. A challenge that discourages clinical adoption, however, is the need for careful sensor-to-body alignment, which slows the data collection process in clinics and is prone to errors when patients take the sensors home. Methods: We propose deep learning models to estimate human movement with noisy data from videos (VideoNet), inertial sensors (IMUNet), and a combination of the two (FusionNet), obviating the need for careful calibration. The video and inertial sensing data used to train the models were generated synthetically from a marker-based motion capture dataset of a broad range of activities and augmented to account for sensor-misplacement and camera-occlusion errors. The models were tested using real data that included walking, jogging, squatting, sit-to-stand, and other activities. Results: On calibrated data, IMUNet was as accurate as state-of-the-art models, while VideoNet and FusionNet reduced mean ± std root-mean-squared errors by 7.6 ± 5.4 ° and 5.9 ± 3.3 °, respectively. Importantly, all the newly proposed models were less sensitive to noise than existing approaches, reducing errors by up to 14.0 ± 5.3 ° for sensor-misplacement errors of up to 30.0 ± 13.7 ° and by up to 7.4 ± 5.5 ° for joint-center-estimation errors of up to 101.1 ± 11.2 mm, across joints. Conclusion: These tools offer clinicians and patients the opportunity to estimate movement with research-grade accuracy, without the need for time-consuming calibration steps or the high costs associated with commercial products such as Theia3D or Xsens, helping democratize the diagnosis, prognosis, and treatment of neuromusculoskeletal conditions.

Markerless 3D human pose tracking through multiple cameras and AI: Enabling high accuracy, robustness, and real-time performance

Marker-Less 3d Human Motion Capture With Monocular Image Sequence And Height-Maps

Evaluation of 3D Markerless Motion Capture Accuracy Using OpenPose With Multiple Video Cameras

Towards Accurate Markerless Human Shape and Pose Estimation over Time

Markerless Motion Capture Using Appearance and Inertial Data

A Markless 3D Human Motion Data Acquisition Method Based on the Binocular Stereo Vision and Lightweight Open Pose Algorithm

Accurate realtime full-body motion capture using a single depth camera

Markerless Motion Tracking With Noisy Video and IMU Data

Markerless Human Body Motion Capture Using Multiple Cameras

Marker-less Motion Capture Technology Based on Binocular Stereo Vision and Deep Learning

Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging

Markerless 3D Human Motion Tracking for Monocular Video Sequences

General Automatic Human Shape and Motion Capture Using Volumetric Contour Cues

New multi-view human motion capture framework

Simultaneous Multi-View Camera Pose Estimation and Object Tracking with Square Planar Markers

Model-Based Markerless Human Body Motion Capture using Multiple Cameras

Real-Time Human Motion Capture Based on Wearable Inertial Sensor Networks

Monocular 3D Human Pose Markerless Systems for Gait Assessment

MarkerPose: Robust Real-time Planar Target Tracking for Accurate Stereo Pose Estimation

Fusion Poser: 3D Human Pose Estimation Using Sparse IMUs and Head Trackers in Real Time

High-Accuracy Real-Time Whole-Body Human Motion Tracking Based on Constrained Nonlinear Kalman Filtering