ICDAR 2023 Competition on Born Digital Video Text Question Answering

Zhibo Yang,Xiaoge Song,Sophia Song,Tong Lü,Xiang Bai,Cheng-Lin Liu,Fei Huang,Cong Yao
DOI: https://doi.org/10.1007/978-3-031-41679-8_30
2023-01-01
Abstract:This paper presents the final results of the ICDAR 2023 Competition on Born Digital Video Text Question Answering (i.e., BDVT-QA) which contains two major task tracks: 1) End-to-End Video Text Spotting, and 2) Video Text Question Answering. BDVT-QA aims to spot texts and answer questions from born-digital videos. The proposed competition introduces a brand new dataset consisting of 1,000 video clips fully annotated with manually-designed question/answer pairs, where the answers are based on the text captions presented in the video clips. A total of 23 final submissions were received for this competition. The top-3 performances of each track are as follows: 1)T1.1 - 57.53%, T1.2 - 53.3%, T1.3 - 52.35%, and 2) T2.1 - 31.2%, T2.2 - 28.84%, T2.3 - 21.19%. We summarize the submitted methods and give a deep analysis. Besides, this paper also includes dataset descriptions, task definitions and evaluation protocols. The dataset and the final ranking of submissions are publicly available on the challenge’s official website: https://tianchi.aliyun.com/specials/promotion/ICDAR_2023_Competition_on_Born_Digital_Video_Text_QA .
What problem does this paper attempt to address?