BundleMoCap: Efficient, Robust and Smooth Motion Capture from Sparse Multiview Videos

Georgios Albanis,Nikolaos Zioulis,Kostas Kolomvatsos
DOI: https://doi.org/10.1145/3626495.3626511
2023-11-21
Abstract:Capturing smooth motions from videos using markerless techniques typically involves complex processes such as temporal constraints, multiple stages with data-driven regression and optimization, and bundle solving over temporal windows. These processes can be inefficient and require tuning multiple objectives across stages. In contrast, BundleMoCap introduces a novel and efficient approach to this problem. It solves the motion capture task in a single stage, eliminating the need for temporal smoothness objectives while still delivering smooth motions. BundleMoCap outperforms the state-of-the-art without increasing complexity. The key concept behind BundleMoCap is manifold interpolation between latent keyframes. By relying on a local manifold smoothness assumption, we can efficiently solve a bundle of frames using a single code. Additionally, the method can be implemented as a sliding window optimization and requires only the first frame to be properly initialized, reducing the overall computational burden. BundleMoCap's strength lies in its ability to achieve high-quality motion capture results with simplicity and efficiency. More details can be found at <a class="link-external link-https" href="https://moverseai.github.io/bundle/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Graphics,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve efficient, robust and smooth motion capture in sparse multi - view videos. Traditional methods usually involve complex processing procedures, such as time constraints, multi - stage data - driven regression and optimization, and bundled solving on time windows. These methods are not only inefficient but also require parameter tuning for multiple targets. In contrast, **BundleMoCap** proposes a novel and efficient method to solve this problem. Specifically, this method eliminates the need for time - smoothing objectives through single - stage solving while still being able to generate smooth motions. Its core idea is to achieve motion capture through manifold interpolation between latent key - frames. ### Main Contributions 1. **Efficiency**: BundleMoCap significantly reduces the computational burden by solving the entire video in a single stage. 2. **Robustness**: This method is more robust to outliers in sparse multi - view settings and can effectively handle noisy observations. 3. **Smoothness**: Even when predicting from very sparse views and with a high number of outliers, BundleMoCap can capture smooth motions without using any time - smoothing objectives. ### Method Overview - **Latent Key - frames**: BundleMoCap reconstructs a bundle of frames by optimizing a latent code instead of solving parameters for each frame individually. - **Manifold Interpolation**: Through manifold interpolation, frames within a time period are reconstructed and constrained starting from the previously solved latent code. - **Sliding - window Optimization**: This method can be implemented as a sliding - window optimization, only requiring the initialization of the first frame, further reducing the overall computational burden. ### Experimental Results - **Performance Evaluation**: On the Human3.6M and MPI - INF - 3DHP datasets, BundleMoCap outperforms other multi - stage optimization methods and methods using motion - smoothing constraints. - **Robustness**: BundleMoCap is more robust to occlusions and incorrect keypoint estimations and can accurately capture human motions in complex scenarios. - **Smoothness**: Despite the noisy keypoint estimations in the input, BundleMoCap can still generate smooth motions without using any time - smoothing objectives. ### Formula Presentation - **Data Term and Prior Term**: \[ \underset{z_t, \beta_t, T_t}{\text{argmin}} E_t^{\text{data}}+E_t^{\text{prior}} \] where \( E_t^{\text{data}} \) is the data term and \( E_t^{\text{prior}} \) is the prior term. - **Definition of Data Term**: \[ E_t^{\text{data}}=\lambda_{RC} \sum_c \sum_i w_i \rho(k_t^i - k_t^{\text{det}, i}) \] where \( \rho \) is the Geman - McClure penalty function for handling noisy estimations. - **Interpolation of Intermediate Frames**: \[ \theta_t = G(S_t(z_0, z_T)), \quad \begin{bmatrix} R_t & t_t \\ 0 & 1 \end{bmatrix}=h \left( S_t(R_0, R_T), L_t(t_0, t_T) \right) \] where \( S_t \) and \( L_t \) are spherical and linear interpolation functions respectively. ### Conclusion BundleMoCap achieves efficient, robust and smooth motion capture through manifold interpolation of latent key - frames, which is suitable for sparse multi - view videos. This method not only simplifies the optimization process but also improves the robustness to outliers and the smoothness of motions.