Abstract:Videos are ubiquitous and have significantly impacted our communication and information consumption. The video, as data, has helped researchers understand how human interactions and relationships develop and change, and how patterns emerge in various circumstances and interpretations. Given the expanding relevance of video data in social science and medical research and the constant introduction of new formats and sources, it is critical to be able to conduct a thorough analysis of this multimodal data. However, the few methodologies (e.g., Actor Network Theory, Picture Theory) appropriate to video data analysis lack detailed guidelines on how to select, organize, and examine the multimodality of video data. This article aims to overcome this practice or methodological gap by proposing and demonstrating the Visual-Verbal Video Analysis (VVVA) method, a six-step framework adapted from Multimodal Theory and Visual Grounded Theory for organizing and evaluating video material according to the following dimensions: general characteristics of the video; multimodal characteristics; visual characteristics; characteristics of primary and secondary characters; and content and compositional characteristics including the transmission of messages, emotions, and discourses. This article also looks at the theories underlying video data analysis, focusing on Grounded Theory and Multimodality Theory, and provides multiple examples of coding and interpretive processes to deepen understanding and comprehension. The VVVA data extraction matrices provide a systematic coding approach for verbal, visual, and textual content, allowing for structured, coherent extraction that supports the discovery of patterns and links among disparate types of information. The VVVA method may be applied to a wide range of video data in social and medical sciences that vary in length and originate from different sources (e.g., open access web sources, pre-recorded organizational videos and recordings created for research purposes). The VVVA method effectively tracks the ongoing research process, and can manage data sets of various sizes.

VVA: Video Values Analysis.

Video Quality Assessment: A Comprehensive Survey

MVVA-Net: a Video Aesthetic Quality Assessment Network with Cognitive Fusion of Multi-type Feature–Based Strong Generalization

Performing Qualitative Content Analysis of Video Data in Social Sciences and Medicine: The Visual-Verbal Video Analysis Method

Value Assessment of UGC Short Videos Through Element Mining and Data Analysis

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation

UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content

Subjective-Aligned Dataset and Metric for Text-to-Video Quality Assessment

Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts

Exploring What Why and How: A Multifaceted Benchmark for Causation Understanding of Video Anomaly

MT-VQA: A Multi-task Approach for Quality Assessment of Short-form Videos

VAD: A Video Affective Dataset with Danmu

Towards Emotion Analysis in Short-form Videos: A Large-Scale Dataset and Baseline

FineVQ: Fine-Grained User Generated Content Video Quality Assessment

Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach

Towards A Better Metric for Text-to-Video Generation

XGC-VQA: A unified video quality assessment model for User, Professionally, and Occupationally-Generated Content

Low-Complexity Video Quality Assessment Using Temporal Quality Variations

Video Transformer based Video Quality Assessment with Spatiotemporally adaptive Token Selection and Assembly

UATVR: Uncertainty-Adaptive Text-Video Retrieval

User-generated Video Quality Assessment: A Subjective and Objective Study