Abstract:Video summarization has great potential in many application areas that enable fast browsing and efficient video indexing. Viewers prefer to browse a video summary containing the contents that they enjoy since watching an entire video may be time-consuming. We believe that it is necessary to create an automated tool that is capable of generating personalized video summaries. In this paper, we propose a new event detection-based personalized video summarization framework and deploy it to create film and soccer video summaries. In order to obtain effective event detection performance, we introduce two transfer learning method. The first event detection method is achieved based on the combination of convolutional neural network and support vector machine (CNNs–SVM). The second method is achieved using a fine-tuned summarization network (SumNet) that fuses fine-tuned object and scene networks. In this study, the training data consists of two datasets: (1) a 21K set of web images of back hugging, hand shaking, and standing talking used to detect a film event, and (2) a 30K set of web soccer match images of goals, fouls, and yellow cards to detect soccer events. Given an original video, we first segment it into shots and then use the trained model for event detection. Finally, based on the specification of user preferences, we generate a personalized event-based summary. We test our framework with several film videos and soccer videos. Experimental results demonstrate that the proposed fine-tuned SumNet achieves the best performance of 96.88% and \(98.50\%\), which is effective for generating personalized video summaries.

Semantic Analysis for Soccer Video Based on Fusion of Multimodal Features

Creating Personalized Video Summaries Via Semantic Event Detection

Multimodal feature extraction and fusion for semantic mining of soccer video: a survey

A semantic description scheme of soccer video based on MPEG-7

A Fusion Scheme of Visual and Auditory Modalities for Event Detection in Sports Video.

Semantic event detection via multimodal data mining

Multi-Mode Semantic Cues Based on Hidden Conditional Random Field in Soccer Video

Semantic Event Extraction From Basketball Games Using Multi-Modal Analysis

Emotion Recognition in Videos via Fusing Multimodal Features.

Event Detection In Basketball Video Using Multiple Modalities

Semantic Video Shot Segmentation Based on Color Ratio Feature and SVM

Semantic Shot Classification in Sports Video

A Unified Framework for Semantic Shot Classification in Sports Videos

Automatic Analysis and Extraction of Soccer Highlights

Automatic Summarization of Soccer Highlights Using Audio-visual Descriptors

A decision tree-based multimodal data mining framework for soccer goal detection

A Statistics-Based Method For Video Semantic Analysis

Modality Mixture Projections for Semantic Video Event Detection

A mid-level representation framework for semantic sports video analysis.

Event Analysis in Soccer Video by Dynamic Programming Based Fusion of Multiple Modalities

A Semi-Automatic Feature Selecting Method For Sports Video Highlight Annotation