Detection and Tracking Based Tubelet Generation for Video Object Detection.

Bin Wang,Sheng Tang,Jun-Bin Xiao,Quan-Feng Yan,Yong-Dong Zhang
DOI: https://doi.org/10.1016/j.jvcir.2018.11.014
IF: 2.887
2018-01-01
Journal of Visual Communication and Image Representation
Abstract:Video object detection (VID) is a more challenging task compared with still-image object detection, which not only needs to detect objects accurately per frame but also needs to track objects for a long period of time. In order to detect objects from videos, we propose a Detection And Tracking (DAT) based tubelet generation framework. Under this framework, we first propose a detection-based tubelet generation method which can generate tubelets with more accurate bounding boxes compared with traditional tracking-based methods. On the other hand, the latter can produce a higher recall of bounding boxes than the former in general. To take advantage of their complementary attributes, we further propose a novel tubelet fusion method to combine these multi-modal information (appearance information in independent images and contextual information in videos). Our extensive experiments on the well-known ILSVRC 2016 VID dataset show that our proposed method can achieve state-of-the-art performances. (C) 2018 Elsevier Inc. All rights reserved.
What problem does this paper attempt to address?