Abstract:Nowadays, the amount of multimedia contents in microblogs is growing significantly. More than 20% of microblogs link to a picture or video in certain large systems. The rich semantics in microblogs provides an opportunity to endow images with higher-level semantics beyond object labels. However, this raises new challenges for understanding the association between multimodal multimedia contents in multimedia-rich microblogs. Disobeying the fundamental assumptions of traditional annotation, tagging, and retrieval systems, pictures and words in multimedia-rich microblogs are loosely associated and a correspondence between pictures and words cannot be established. To address the aforementioned challenges, we present the first study analyzing and modeling the associations between multimodal contents in microblog streams, aiming to discover multimodal topics from microblogs by establishing correspondences between pictures and words in microblogs. We first use a data-driven approach to analyze the new characteristics of the words, pictures, and their association types in microblogs. We then propose a novel generative model called the Bilateral Correspondence Latent Dirichlet Allocation (BC-LDA) model. Our BC-LDA model can assign flexible associations between pictures and words and is able to not only allow picture-word co-occurrence with bilateral directions, but also single modal association. This flexible association can best fit the data distribution, so that the model can discover various types of joint topics and generate pictures and words with the topics accordingly. We evaluate this model extensively on a large-scale real multimedia-rich microblogs dataset. We demonstrate the advantages of the proposed model in several application scenarios, including image tagging, text illustration, and topic discovery. The experimental results demonstrate that our proposed model can significantly and consistently outperform traditional approaches.

Bilateral Correspondence Model for Words-and-Pictures Association in Multimedia-Rich Microblogs

A Joint Model of Conversational Discourse and Latent Topics on Microblogs

Large scale microblog mining using distributed MB-LDA.

A novel label-based multimodal topic model for social media analysis

Cross-Modality Microblog Sentiment Prediction Via Bi-Layer Multimodal Hypergraph Learning

A Joint Model Of Extended Lda And Ibtm Over Streaming Chinese Short Texts

What You Say and How You Say it: Joint Modeling of Topics and Discourse in Microblog Conversations

Semantic Link Network-Based Model for Organizing Multimedia Big Data

Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings

Content-oriented Multimedia Document Understanding Through Cross-Media Correlation

Multimodal association mining for personalized image browsing

Do Photos Help Express Our Feelings: Incorporating Multimodal Features into Microblog Sentiment Analysis.

Constrained-hLDA for Topic Discovery in Chinese Microblogs.

A Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieval

Inferring Correspondences from Multiple Sources for Microblog User Tags

Multi-modal Deep Analysis for Multimedia

Word Dictionary Emoticon Dictionary SentiBank : ANP Detector Library Microblog with labeled sentiment Testing microblogs Update Update W Update g

Model Composition for Multimodal Large Language Models

Annotation Efficient Cross-Modal Retrieval with Adversarial Attentive Alignment

Automatic Image Annotation Based on Wordnet and Hierarchical Ensembles