Learning Socially Embedded Visual Representation from Scratch

Shaowei Liu,Peng Cui,Wenwu Zhu,Shiqiang Yang
DOI: https://doi.org/10.1145/2733373.2806247
2015-01-01
Abstract:Learning image representation by deep model has recently made remarkable achievements for semantic-oriented applications, such as image classification. However, for user-centric tasks, such as image search and recommendation, simply employing the representation learnt from semantic-oriented tasks may fail to capture user intentions. In this paper, we propose a novel Socially Embedded VIsual Representation Learning ( SEVIR ) approach, where an Asymmetric Multi-task CNN ( amtCNN ) model is proposed to embed user intention learning task into semantic learning task. Specifically, to address the sparsity and unreliability problems in social behavioral data, we propose to use user clustering, reliability evaluation, random dropout in output layer in our amtCNN . With its the partially shared network architecture, the learnt representation can capture both semantics and user intentions. Comprehensive experiments are conducted to investigate the effectiveness of our approach in applications of user favoring prediction, personalized image recommendation, and image reranking. Compared to the state-of-the-art image representation techniques, our approach achieves significant improvement in performance.
What problem does this paper attempt to address?