VVA: Video Values Analysis.

Yachun Mi,Yan Shu,Honglei Xu,Shaohui Liu,Feng Jiang
DOI: https://doi.org/10.1007/978-981-99-8540-1_28
2024-01-01
Abstract:User-generated content videos have attracted increasingly attention due to its dominant role in social platforms. It is crucial to analyze values in videos because the extensive range of video content results in significant variations in the subjective quality of videos. However, the research literature on Video Values Analysis (VVA) is very scarce, which aims to evaluate the compatibility between video content and the social mainstream values. Meanwhile, existing video content analysis methods are mainly based on classification techniques, which can not adequate VVA due to their coarse-grained manners. To tackle this challenge, we propose a framework to generate more fine-grained scores for diverse videos, termed as Video Values Analysis Model (VVAM), which consists of a feature extractor based on R3D, a feature aggregation module based on Transformer and a regression head based on MLP. In addition, considered texts in videos can be key clues to improve VVA, we design a new pipeline, termed as Text-Guided Video Values Analysis Model (TG-VVAM), in which texts in videos are spotted by OCR tools and a cross-modal fusion module is used to combine the vision and text features. To further facilitate the VVA, we construct a large-scale dataset, termed as Video Values Analysis Dataset (VVAD), which contains 53,705 short videos of various types from main social platforms. Experiments demonstrate that our proposed VVAM and TG-VVAM achieves promising results in the VVAD.
What problem does this paper attempt to address?