Abstract:This thesis presents an innovative approach to automate video thumbnail selection for traditional broadcast content. Our methodology establishes stringent criteria for diverse, representative, and aesthetically pleasing thumbnails, considering factors like logo placement space, incorporation of vertical aspect ratios, and accurate recognition of facial identities and emotions. We introduce a sophisticated multistage pipeline that can select candidate frames or generate novel images by blending video elements or using diffusion models. The pipeline incorporates state-of-the-art models for various tasks, including downsampling, redundancy reduction, automated cropping, face recognition, closed-eye and emotion detection, shot scale and aesthetic prediction, segmentation, matting, and harmonization. It also leverages large language models and visual transformers for semantic consistency. A GUI tool facilitates rapid navigation of the pipeline's output. To evaluate our method, we conducted comprehensive experiments. In a study of 69 videos, 53.6% of our proposed sets included thumbnails chosen by professional designers, with 73.9% containing similar images. A survey of 82 participants showed a 45.77% preference for our method, compared to 37.99% for manually chosen thumbnails and 16.36% for an alternative method. Professional designers reported a 3.57-fold increase in valid candidates compared to the alternative method, confirming that our approach meets established criteria. In conclusion, our findings affirm that the proposed method accelerates thumbnail creation while maintaining high-quality standards and fostering greater user engagement.

Automatic Preview Frame Selection for Online Videos

A Human-Machine Collaborative Video Summarization Framework Using Pupillary Response Signals

New Fusional Framework Combining Sparse Selection and Clustering for Key Frame Extraction.

A Novel Compact Yet Rich Key Frame Creation Method for Compressed Video Summarization

Personalized Key Frame Recommendation

To Click or Not To Click: Automatic Selection of Beautiful Thumbnails from Videos

A Dynamic Frame Selection Framework for Fast Video Recognition.

AdaFrame: Adaptive Frame Selection for Fast Video Recognition

User-based key frame detection in social web video

Watching a Small Portion Could Be As Good As Watching All: Towards Efficient Video Classification.

Video abstraction based on the visual attention model and online clustering

Adaptive Selection of Reference Frames for Video Object Segmentation.

A novel video abstraction method based on fast clustering of the regions of interest in key frames

Video Abstraction via Attention Model and On-Line Clustering

A Novel Framework for Web Video Thumbnail Generation

Fast video shot boundary detection framework employing pre-processing techniques

A Privacy-aware Framework for Assessing and Recommending Short Video Advertisement

Automating Video Thumbnails Selection and Generation with Multimodal and Multistage Analysis

Learning Fine-grained User Interests for Micro-video Recommendation

An Empirical Study of Frame Selection for Text-to-Video Retrieval

Frame importance and temporal memory effect-based fast video quality assessment for user-generated content