Abstract:Lecture videos are the most popular learning materials due to their pedagogical benefits. However, accessing a topic or subtopic of interest requires manual examination of each frame of the video and it is more tedious when the volume and length of videos increases. The main problem thus becomes the efficient automatic segmentation and indexing of lecture videos that enables faster retrieval of specific and relevant content. In this paper, we present automatic indexing of lecture videos using topic hierarchies extracted from slide text and audio transcripts. Indexing videos based on slide text information is more accurate due to higher character recognition rates but, text content is very abstract and subjective. In contrast to slide text, audio transcripts provide comprehensive details about the topics, however retrieval results are imprecise due to higher WER. In order to address this problem, we propose a novel idea of fusing complementary strengths of slide text and audio transcript information using semi-supervised LDA algorithm. Further, we strive to improve learning of the model by utilizing words recognized from video slides as seed words and train the model to learn the distribution of video transcriptions around these seed words. We test the performance of proposed multimodal indexing scheme on 500 number of class room videos downloaded from Coursera, NPTEL and KLETU (KLE Technological University) classroom videos. The proposed multimodal fusion based scheme achieves an average percentage improvement of 44.49% F-Score compared with indexing using unimodal approaches.

Content Based Lecture Video Retrieval Using Speech and Video Text Information

A new video text detection method.

Unsupervised Teacher-Student Model for Large-Scale Video Retrieval.

A Cross-media Retrieval System for Lecture Videos

Content-Based Video Browsing by Text Region Localization and Classification

ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound

Content Extraction from Lecture Video via Speaker Action Classification Based on Pose Information

Multimodal Fusion of Speech and Text using Semi-supervised LDA for Indexing Lecture Videos

Hierarchical Visual Interface for Lecture Video Retrieval and Summarization

Structuring Lecture Videos by Automatic Projection Screen Localization and Analysis

Multimedia Analysis and Retrieval System

Content-Based Image And Video Indexing And Retrieval

Semantic-based surveillance video retrieval

Efficient Indexing, Browsing and Retrieval of Image/video Content

News Video Retrieval By Learning Multimodal Semantic Information

Retrieval of Sports Video Clips Using Audio-Visual Features and Text Information

The VISIONE Video Search System: Exploiting Off-the-Shelf Text Search Engines for Large-Scale Video Retrieval

Video parsing based on head tracking and face recognition

A Scalable Video Search Engine Based on Audio Content Indexing and Topic Segmentation

Fast and robust video clip search using index structure.

Learning Structured Concept-Segments for Interactive Video Retrieval