Structuring Lecture Videos by Automatic Projection Screen Localization and Analysis

Kai Li,Jue Wang,Haoqian Wang,Qionghai Dai
DOI: https://doi.org/10.1109/tpami.2014.2361133
IF: 23.6
2014-01-01
IEEE Transactions on Pattern Analysis and Machine Intelligence
Abstract:We present a fully automatic system for extracting the semantic structure of a typical academic presentation video, which captures the whole presentation stage with abundant camera motions such as panning, tilting, and zooming. Our system automatically detects and tracks both the projection screen and the presenter whenever they are visible in the video. By analyzing the image content of the tracked screen region, our system is able to detect slide progressions and extract a high-quality, non-occluded, geometrically-compensated image for each slide, resulting in a list of representative images that reconstruct the main presentation structure. Afterwards, our system recognizes text content and extracts keywords from the slides, which can be used for keyword-based video retrieval and browsing. Experimental results show that our system is able to generate more stable and accurate screen localization results than commonly-used object tracking methods. Our system also extracts more accurate presentation structures than general video summarization methods, for this specific type of video.
What problem does this paper attempt to address?