Fine-Grained Human-Centric Tracklet Segmentation with Single Frame Supervision

Si Liu,Guanghui Ren,Yao Sun,Jinqiao Wang,Changhu Wang,Bo Li,Shuicheng Yan
DOI: https://doi.org/10.1109/tpami.2019.2911936
IF: 23.6
2022-02-01
IEEE Transactions on Pattern Analysis and Machine Intelligence
Abstract:In this paper, we target at the Fine-grAined human-Centric Tracklet Segmentation (FACTS) problem, where 12 human parts, e.g., face, pants, left-leg, are segmented. To reduce the heavy and tedious labeling efforts, FACTS requires only one labeled frame per video during training. The small size of human parts and the labeling scarcity makes FACTS very challenging. Considering adjacent frames of videos are continuous and human usually do not change clothes in a short time, we explicitly consider the pixel-level and frame-level context in the proposed Temporal Context segmentation Network (TCNet). On the one hand, optical flow is on-line calculated to propagate the pixel-level segmentation results to neighboring frames. On the other hand, frame-level classification likelihood vectors are also propagated to nearby frames. By fully exploiting the pixel-level and frame-level context, TCNet indirectly uses the large amount of unlabeled frames during training and produces smooth segmentation results during inference. Experimental results on four video datasets show the superiority of TCNet over the state-of-the-arts. The newly annotated datasets can be downloaded via http://liusi-group.com/projects/FACTS for the further studies.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?