Abstract:DanMu, an emerging type of user-generated comment, has become increasingly popular in recent years. Many online video platforms such as Tudou.com have provided the DanMu function. Unlike traditional online reviews such as reviews at Youtube.com that are outside the videos, DanMu is a scrolling marquee comment, which is overlaid directly on top of the video and synchronized to a specific playback time. Such comments are displayed as streams of moving subtitles overlaid on the video screen. Viewers could easily write DanMus while watching videos, and the written DanMus will be immediately overlaid onto the video and displayed to writers themselves and other viewers as well. Such DanMu systems have greatly enabled users to communicate with each other in a much more direct way, creating a real-time sharing experience. Although there are several unique features of DanMu and has had a great impact on online video systems, to the best of our knowledge, there is no work that has provided a comprehensive study on DanMu. In this article, as a pilot study, we analyze the unique characteristics of DanMu from various perspectives. Specifically, we first illustrate some unique distributions of DanMus by comparing with traditional reviews (TReviews) that we collected from a real DanMu-enabled online video system. Second, we discover two interesting patterns in DanMu data: a herding effect and multiple-burst phenomena that are significantly different from those in TRviews and reveal important insights about the growth of DanMus on a video. Towards exploring antecedents of both th herding effect and multiple-burst phenomena, we propose to further detect leading DanMus within bursts, because those leading DanMus make the most contribution to both patterns. A framework is proposed to detect leading DanMus that effectively combines multiple factors contributing to leading DanMus. Based on the identified characteristics of DanMu, finally we propose to predict the distribution of future DanMus (i.e., the growth of DanMus), which is important for many DanMu-enabled online video systems, for example, the predicted DanMu distribution could be an indicator of video popularity. This prediction task includes two aspects: One is to predict which videos future DanMus will be posted for, and the other one is to predict which segments of a video future DanMus will be posted on. We develop two sophisticated models to solve both problems. Finally, intensive experiments are conducted with a real-world dataset to validate all methods developed in this article.

Gossiping the Videos: An Embedding-Based Generative Adversarial Framework for Time-Sync Comments Generation

To Create What You Tell: Generating Videos from Captions

Understanding the Users and Videos by Mining a Novel Danmu Dataset

VCMaster: Generating Diverse and Fluent Live Video Comments Based on Multimodal Contexts

Comprehending the Gossips: Meme Explanation in Time-Sync Video Comment via Multimodal Cues

Exploring the Emerging Type of Comment for Online Videos: DanMu

Live Video Comment Generation Based on Surrounding Frames and Live Comments

Bridging Video Content And Comments: Synchronized Video Description With Temporal Summarization Of Crowdsourced Time-Sync Comments

Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting

Visual-Texual Emotion Analysis with Deep Coupled Video and Danmu Neural Networks

Towards Generating Diverse Audio Captions via Adversarial Training

Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs

Diverse Audio Captioning via Adversarial Training

Scripted Video Generation With a Bottom-Up Generative Adversarial Network

Share-and-Chat: Achieving Human-Level Video Commenting by Search and Multi-View Embedding.

DanmuVis: Visualizing Danmu Content Dynamics and Associated Viewer Behaviors in Online Videos

LiveChat: Video Comment Generation from Audio-Visual Multimodal Contexts

Video-to-Audio Generation with Hidden Alignment

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

Enhancing Multimodal Affective Analysis with Learned Live Comment Features