Coherent Reconstruction of Multiple Humans from a Single Image

Wen Jiang,Nikos Kolotouros,Georgios Pavlakos,Xiaowei Zhou,Kostas Daniilidis
DOI: https://doi.org/10.1109/cvpr42600.2020.00562
2020-01-01
Abstract:In this work, we address the problem of multi-person 3D pose estimation froma single image. A typical regression approach in the top-down setting of thisproblem would first detect all humans and then reconstruct each one of themindependently. However, this type of prediction suffers from incoherentresults, e.g., interpenetration and inconsistent depth ordering between thepeople in the scene. Our goal is to train a single network that learns to avoidthese problems and generate a coherent 3D reconstruction of all the humans inthe scene. To this end, a key design choice is the incorporation of the SMPLparametric body model in our top-down framework, which enables the use of twonovel losses. First, a distance field-based collision loss penalizesinterpenetration among the reconstructed people. Second, a depth ordering-awareloss reasons about occlusions and promotes a depth ordering of people thatleads to a rendering which is consistent with the annotated instancesegmentation. This provides depth supervision signals to the network, even ifthe image has no explicit 3D annotations. The experiments show that ourapproach outperforms previous methods on standard 3D pose benchmarks, while ourproposed losses enable more coherent reconstruction in natural images. Theproject website with videos, results, and code can be found at:https://jiangwenpl.github.io/multiperson
What problem does this paper attempt to address?