Automatic Preview Frame Selection for Online Videos

Boyan Zhang,Zhiyong Wang,Dacheng Tao,Xian-Sheng Hua,David Dagan Feng
DOI: https://doi.org/10.1109/DICTA.2015.7371237
2015-01-01
Abstract:The preview frame of an online video plays a critical role for a user to quickly decide whether to watch the video. However, the preview frames of most online videos such as those shared on social media platforms are either selected heuristically (e.g., the first or middle frame of a video) or manually by users or experienced editors. In this paper, we investigate the challenging automatic preview frame selection task and formulae it as a classification problem. To our best knowledge, this is the one of the first attempts on this topic, since most existing key frame selection methods do not explicitly aim for selecting the best representative one only. Considering that a preview frame for an entire video should be informative in the context of the video story, attention catching, and of high visual quality, we propose three types of features to characterize each video frame: informativeness, attention, and aesthetics. Due to the imbalanced nature of training data (i.e., one preview frame only vs thousands of non-preview frames in a video), we utilize random forests to learn the features of preview frames and to classify each frame into preview frame or non-preview frame. In addition, we also increase the number of positive training samples by identifying frames which are visually similar to the preview frame. We evaluated our proposed method both quantitatively and qualitatively with a set of 180 news videos manually collected from the BBC news website. Experimental results indicate that our method is promising. We also investigated the contribution of each visual feature to guide future studies.
What problem does this paper attempt to address?