Abstract:Capturing smooth motions from videos using markerless techniques typically involves complex processes such as temporal constraints, multiple stages with data-driven regression and optimization, and bundle solving over temporal windows. These processes can be inefficient and require tuning multiple objectives across stages. In contrast, BundleMoCap introduces a novel and efficient approach to this problem. It solves the motion capture task in a single stage, eliminating the need for temporal smoothness objectives while still delivering smooth motions. BundleMoCap outperforms the state-of-the-art without increasing complexity. The key concept behind BundleMoCap is manifold interpolation between latent keyframes. By relying on a local manifold smoothness assumption, we can efficiently solve a bundle of frames using a single code. Additionally, the method can be implemented as a sliding window optimization and requires only the first frame to be properly initialized, reducing the overall computational burden. BundleMoCap's strength lies in its ability to achieve high-quality motion capture results with simplicity and efficiency. More details can be found at <a class="link-external link-https" href="https://moverseai.github.io/bundle/" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to achieve efficient, robust and smooth motion capture in sparse multi - view videos. Traditional methods usually involve complex processing procedures, such as time constraints, multi - stage data - driven regression and optimization, and bundled solving on time windows. These methods are not only inefficient but also require parameter tuning for multiple targets. In contrast, **BundleMoCap** proposes a novel and efficient method to solve this problem. Specifically, this method eliminates the need for time - smoothing objectives through single - stage solving while still being able to generate smooth motions. Its core idea is to achieve motion capture through manifold interpolation between latent key - frames. ### Main Contributions 1. **Efficiency**: BundleMoCap significantly reduces the computational burden by solving the entire video in a single stage. 2. **Robustness**: This method is more robust to outliers in sparse multi - view settings and can effectively handle noisy observations. 3. **Smoothness**: Even when predicting from very sparse views and with a high number of outliers, BundleMoCap can capture smooth motions without using any time - smoothing objectives. ### Method Overview - **Latent Key - frames**: BundleMoCap reconstructs a bundle of frames by optimizing a latent code instead of solving parameters for each frame individually. - **Manifold Interpolation**: Through manifold interpolation, frames within a time period are reconstructed and constrained starting from the previously solved latent code. - **Sliding - window Optimization**: This method can be implemented as a sliding - window optimization, only requiring the initialization of the first frame, further reducing the overall computational burden. ### Experimental Results - **Performance Evaluation**: On the Human3.6M and MPI - INF - 3DHP datasets, BundleMoCap outperforms other multi - stage optimization methods and methods using motion - smoothing constraints. - **Robustness**: BundleMoCap is more robust to occlusions and incorrect keypoint estimations and can accurately capture human motions in complex scenarios. - **Smoothness**: Despite the noisy keypoint estimations in the input, BundleMoCap can still generate smooth motions without using any time - smoothing objectives. ### Formula Presentation - **Data Term and Prior Term**: \[ \underset{z_t, \beta_t, T_t}{\text{argmin}} E_t^{\text{data}}+E_t^{\text{prior}} \] where \( E_t^{\text{data}} \) is the data term and \( E_t^{\text{prior}} \) is the prior term. - **Definition of Data Term**: \[ E_t^{\text{data}}=\lambda_{RC} \sum_c \sum_i w_i \rho(k_t^i - k_t^{\text{det}, i}) \] where \( \rho \) is the Geman - McClure penalty function for handling noisy estimations. - **Interpolation of Intermediate Frames**: \[ \theta_t = G(S_t(z_0, z_T)), \quad \begin{bmatrix} R_t & t_t \\ 0 & 1 \end{bmatrix}=h \left( S_t(R_0, R_T), L_t(t_0, t_T) \right) \] where \( S_t \) and \( L_t \) are spherical and linear interpolation functions respectively. ### Conclusion BundleMoCap achieves efficient, robust and smooth motion capture through manifold interpolation of latent key - frames, which is suitable for sparse multi - view videos. This method not only simplifies the optimization process but also improves the robustness to outliers and the smoothness of motions.

BundleMoCap: Efficient, Robust and Smooth Motion Capture from Sparse Multiview Videos

Noise-in, Bias-out: Balanced and Real-time MoCap Solving

HybridCap: Inertia-Aid Monocular Capture of Challenging Human Motions

Imocap: Motion Capture from Internet Videos

RoMo: A Robust Solver for Full-body Unlabeled Optical Motion Capture

A group of novel approaches and a toolkit for motion capture data reusing

DeMoCap: Low-Cost Marker-Based Motion Capture

MoCap-solver

4D Association Graph for Realtime Multi-person Motion Capture Using Multiple Video Cameras

Towards Unstructured Unlabeled Optical Mocap: A Video Helps!

MoCap-Solver: A Neural Solver for Optical Motion Capture Data

Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera

Markerless motion capture of multiple characters using multiview image segmentation

DeepMoCap: Deep Optical Motion Capture Using Multiple Depth Sensors and Retro-Reflectors

Markerless Shape and Motion Capture From Multiview Video Sequences

MulayCap: Multi-layer Human Performance Capture Using A Monocular Video Camera

MoCapDeform: Monocular 3D Human Motion Capture in Deformable Scenes

Lightweight Multi-person Total Motion Capture Using Sparse Multi-view Cameras

MOVIN: Real-time Motion Capture using a Single LiDAR

ChallenCap: Monocular 3D Capture of Challenging Human Performances using Multi-Modal References

Three Axis Kinematics Study for Motion Capture Using Augmented Reality