UTS-CMU-D2DCRC Submission at TRECVID 2016 Video Localization.

Linchao Zhu,Xuanyi Dong,Yi Yang,Alexander G. Hauptmann
2016-01-01
Abstract:In this report, we summarize our solution to TRECVID 2016 Video Localization task. We mainly use Faster R-CNN to localize objects in the spatial domain which is combined with frame-level and shot-level detectors to localize concepts in the temporal domain. We collected images with annotated bounding box from external sources, e.g., ImageNet Detection dataset and manually annotate bounding boxes for categories without any annotations. We trained frame-level detectors using ResNet-200 features pre-trained on ImageNet and for classes of “Running”, “Sitting Down” and “Dancing”, we also use improved Dense Trajectories features. Finally, we fuse bounding box score, frame score and shot score to get the final score for each bounding box.
What problem does this paper attempt to address?