Harnessing LLMs for Automated Video Content Analysis: An Exploratory Workflow of Short Videos on Depression

Jiaying Lizzy Liu,Yunlong Wang,Yao Lyu,Yiheng Su,Shuo Niu,Xuhai Orson Xu,Yan Zhang

DOI: https://doi.org/10.1145/3678884.3681850

2024-07-30

Abstract:Despite the growing interest in leveraging Large Language Models (LLMs) for content analysis, current studies have primarily focused on text-based content. In the present work, we explored the potential of LLMs in assisting video content analysis by conducting a case study that followed a new workflow of LLM-assisted multimodal content analysis. The workflow encompasses codebook design, prompt engineering, LLM processing, and human evaluation. We strategically crafted annotation prompts to get LLM Annotations in structured form and explanation prompts to generate LLM Explanations for a better understanding of LLM reasoning and transparency. To test LLM's video annotation capabilities, we analyzed 203 keyframes extracted from 25 YouTube short videos about depression. We compared the LLM Annotations with those of two human coders and found that LLM has higher accuracy in object and activity Annotations than emotion and genre Annotations. Moreover, we identified the potential and limitations of LLM's capabilities in annotating videos. Based on the findings, we explore opportunities and challenges for future research and improvements to the workflow. We also discuss ethical concerns surrounding future studies based on LLM-assisted video analysis.

Human-Computer Interaction,Artificial Intelligence,Computers and Society

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to utilize the capabilities of large - language models (LLMs) in video content analysis, especially to conduct automated multi - modal content analysis on depression - related content in short videos. Current research mainly focuses on text content, while this paper explores the potential of LLMs in video content analysis and achieves this goal through a four - step workflow of designing codebooks, prompt engineering, LLM processing, and manual evaluation. Specifically, the researchers tested the capabilities of LLMs in video annotation by analyzing 203 key frames extracted from YouTube and compared them with the annotations of two human coders. The study found that LLMs have high accuracy in object and activity annotations, but perform poorly in emotion and type annotations. In addition, the paper also discusses opportunities and challenges for future research, including how to better integrate human participation in the workflow and how to improve LLMs to understand the dynamic context of videos.

Harnessing LLMs for Automated Video Content Analysis: An Exploratory Workflow of Short Videos on Depression

Video Understanding with Large Language Models: A Survey

VideoLLM: Modeling Video Sequence with Large Language Models

LMVD: A Large-Scale Multimodal Vlog Dataset for Depression Detection in the Wild

Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia

EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model

When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection

MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media

LLMs Meet Long Video: Advancing Long Video Comprehension with an Interactive Visual Adapter in LLMs.

Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding

Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data

VideoQA in the Era of LLMs: An Empirical Study

EmoLLMs: A Series of Emotional Large Language Models and Annotation Tools for Comprehensive Affective Analysis

Towards Interpretable Mental Health Analysis with Large Language Models

MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models

Sentiment Analysis in the Era of Large Language Models: A Reality Check

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

From Text to Emotion: Unveiling the Emotion Annotation Capabilities of LLMs

Understanding Long Videos with Multimodal Language Models

PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation

Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental Health