Following the Lecturer: Hierarchical Knowledge Concepts Prediction for Educational Videos

Xin Zhang,Qi Liu,Wei Huang,Weidong He,Tong Xiao,Ye Huang
DOI: https://doi.org/10.1007/978-3-031-20500-2_13
2022-01-01
Abstract:With an irresistible trend of intelligent learning, predicting knowledge concepts for educational videos turns out to be a fundamental and essential task, which benefits personalized recommendation, retrieval, and learning. Prior studies of videos mainly focus on relatively short human actions and object recognition, while educational videos are minutes long and have heterogeneous elements such as texts, formulas, and hand-drawn graphics that serve lecturers' narration. Owing to the characteristics of education, most of the segmentation strategies for long-term videos do not apply well to educational videos. In addition, educational videos consist of progressive or referential sections and contain multimodal information. Thus, we propose a novel framework called Spotlight Flow Network (SFNet) to obtain hierarchical knowledge concepts for educational videos with multi-modality. Specifically, we first adopt an effective text-to-visual section segmentation strategy. Then, we model the mechanism that the viewers' spotlight follows the lecturer and leverage the associations between sections to enhance multimodal representation. We also consider explicit inter-level constraints of the hierarchical knowledge structure and associations between sections and concepts to get better predicting performance. Extensive experimental results on real-world data demonstrate the effectiveness of SFNet.
What problem does this paper attempt to address?