Trimmed Action Recognition, Dense-Captioning Events in Videos, and Spatio-temporal Action Localization with Focus on ActivityNet Challenge 2019

Zhaofan Qiu,Dong Li,Yehao Li,Qi Cai,Yingwei Pan,Ting Yao
DOI: https://doi.org/10.48550/arXiv.1906.07016
2019-06-14
Computer Vision and Pattern Recognition
Abstract:This notebook paper presents an overview and comparative analysis of our systems designed for the following three tasks in ActivityNet Challenge 2019: trimmed action recognition, dense-captioning events in videos, and spatio-temporal action localization.
What problem does this paper attempt to address?