Predicting Content Similarity Via Multimodal Modeling for Video-In-Video Advertising.

Xue Song,Baohan Xu,Yu-Gang Jiang
DOI: https://doi.org/10.1109/tcsvt.2020.2979928
IF: 5.859
2021-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Rapid development of mobile devices has led to explosive growth of videos and online platforms, which creates great demand for online advertising in videos. Existing advertising methods often aim to randomly select a time point as insertion position, which means that the video content is likely not related to the ad content, resulting in unsatisfactory user experience. While previous works have neglected to understand rich semantics as well as multimodal information in video advertising, in contrast to previous works, we present an innovative method for video-in-video advertising using multimodal modeling. First, different pre-trained models are used to extract multimodal representations. Then, through multimodal modeling, we learn the complementarity among different representations and obtain a unified video-level description. Finally, the unified representations of ads and videos are utilized to find the best matching result for each advertisement. Our method emphasizes the content similarity between ad and video, which would make the transition between video and ad more natural. Comprehensive experiments with both objective and subjective evaluations demonstrate the effectiveness and user-friendliness of our proposed video-in-video advertising framework.
What problem does this paper attempt to address?