Free-view Rendering of Dynamic Human from Monocular Video Via Modeling Temporal Information Globally and Locally among Adjacent Frames

Cheng Shang,Jidong Tian,Jiannan Ye,Xubo Yang
DOI: https://doi.org/10.1109/icme57554.2024.10687427
2024-01-01
Abstract:Recent research developments on rendering dynamic humans using neural radiance fields are remarkable. These methods often utilize learning implicit geometry and image appearance rendering for digital humans. However, keeping the complex and fast motions in detail, such as fingers, clothes, and faces, remains a challenge. Inspired by temporal information from human motion, we propose an architecture among adjacent frames by constructing a model on global and local levels. For the global level, we propose a hidden Markov model (HMM)based method to capture the global similarity among adjacent frames. At the local level, we introduce a module composed of a multi-head attention mechanism on a triplet canonical space structure for patch-level local temporal information. Experiments on two public datasets of dynamic human rendering (ZJU-MoCap and the People-Snapshot dataset) demonstrate that the proposed method outperforms advanced methods quantitatively and qualitatively.
What problem does this paper attempt to address?