WildQA: In-the-Wild Video Question Answering

Santiago Castro,Naihao Deng,Pingxuan Huang,Mihai Burzo,Rada Mihalcea
DOI: https://doi.org/10.48550/arXiv.2209.06650
2022-09-14
Abstract:Existing video understanding datasets mostly focus on human interactions, with little attention being paid to the "in the wild" settings, where the videos are recorded outdoors. We propose WILDQA, a video understanding dataset of videos recorded in outside settings. In addition to video question answering (Video QA), we also introduce the new task of identifying visual support for a given question and answer (Video Evidence Selection). Through evaluations using a wide range of baseline models, we show that WILDQA poses new challenges to the vision and language research communities. The dataset is available at <a class="link-external link-https" href="https://lit.eecs.umich.edu/wildqa/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Computation and Language
What problem does this paper attempt to address?