Unified QA-aware Knowledge Graph Generation Based on Multi-modal Modeling

Penggang Qin,Jiarui Yu,Yan Gao,Derong Xu,Yunkai Chen,Shiwei Wu,Tong Xu,Enhong Chen,Yanbin Hao
DOI: https://doi.org/10.1145/3503161.3551604
2022-01-01
Abstract:Understanding the long duration videos' storyline is often considered a major challenge in the field of video understanding. To promote research on understanding longer videos in the community, the deep video understanding (DVU) task is suggested for recognizing interactions at the scene level and relationships at the movie level, as well as answering questions at these two levels. In this work, we propose a unified QA-aware knowledge graph generation approach, which consists of the relation-centric graph and interaction-centric graph and demonstrates the powerful performance of multimodal pre-training models in solving such problems. Extensive validations on the HLVU dataset demonstrate the effectiveness of our proposed method.
What problem does this paper attempt to address?