Abstract:Video processing and analysis have become an urgent task, as a huge amount of videos (e.g., YouTube, Hulu) are uploaded online every day. The extraction of representative key frames from videos is important in video processing and analysis since it greatly reduces computing resources and time. Although great progress has been made recently, large-scale video classification remains an open problem, as the existing methods have not well balanced the performance and efficiency simultaneously. To tackle this problem, this work presents an unsupervised method to retrieve the key frames, which combines the convolutional neural network and temporal segment density peaks clustering. The proposed temporal segment density peaks clustering is a generic and powerful framework, and it has two advantages compared with previous works. One is that it can calculate the number of key frames automatically. The other is that it can preserve the temporal information of the video. Thus, it improves the efficiency of video classification. Furthermore, a long short-term memory network is added on the top of the convolutional neural network to further elevate the performance of classification. Moreover, a weight fusion strategy of different input networks is presented to boost performance. By optimizing both video classification and key frame extraction simultaneously, we achieve better classification performance and higher efficiency. We evaluate our method on two popular datasets (i.e., HMDB51 and UCF101), and the experimental results consistently demonstrate that our strategy achieves competitive performance and efficiency compared with the state-of-the-art approaches.

Efficient video face recognition based on frame selection and quality assessment

New Fusional Framework Combining Sparse Selection and Clustering for Key Frame Extraction.

Intelligent Frame Selection as a Privacy-Friendlier Alternative to Face Recognition

CNN Based Key Frame Extraction for Face in Video Recognition

Deep Unsupervised Key Frame Extraction for Efficient Video Classification

LEARNING-BASED MULTI-FRAME VIDEO QUALITY ENHANCEMENT

A Dynamic Frame Selection Framework for Fast Video Recognition.

Face acquiring optimization based on video sensor network

AdaFrame: Adaptive Frame Selection for Fast Video Recognition

Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration

A Coarse-to-Fine Framework for Resource Efficient Video Recognition.

Deep Tiny Network for Recognition-Oriented Face Image Quality Assessment

Face Quality Assessment via Semi-supervised Learning.

Adaptive Focus for Efficient Video Recognition

Heterogeneous feature fusion-based optimal face image acquisition in visual sensor network

A deep learning approach for quality enhancement of surveillance video

A Reliable, Self-Adaptive Face Identification Framework via Lyapunov Optimization

Face Image Quality Assessment Based on Learning to Rank

Method of the Face Identification Using the ATM Video Based on SIFT Algorithm

Automatic Face Image Quality Prediction

Face Recognition of Remote Teaching Video Image Based on Improved Frame Difference Method