Abstract:Given a query photo issued by a user (q-user), the landmark retrieval is to return a set of photos with their landmarks similar to those of the query, while the existing studies on the landmark retrieval focus on exploiting geometries of landmarks for similarity matches between candidate photos and a query photo. We observe that the same landmarks provided by different users over social media community may convey different geometry information depending on the viewpoints and/or angles, and may, subsequently, yield very different results. In fact, dealing with the landmarks with low quality shapes caused by the photography of q-users is often nontrivial and has seldom been studied. In this paper, we propose a novel framework, namely, multi-query expansions, to retrieve semantically robust landmarks by two steps. First, we identify the top- k photos regarding the latent topics of a query landmark to construct multi-query set so as to remedy its possible low quality shape. For this purpose, we significantly extend the techniques of Latent Dirichlet Allocation. Then, motivated by the typical collaborative filtering methods, we propose to learn a collaborative deep networks-based semantically, nonlinear, and high-level features over the latent factor for landmark photo as the training set, which is formed by matrix factorization over collaborative user-photo matrix regarding the multi-query set. The learned deep network is further applied to generate the features for all the other photos, meanwhile resulting into a compact multi-query set within such space. Then, the final ranking scores are calculated over the high-level feature space between the multi-query set and all other photos, which are ranked to serve as the final ranking list of landmark retrieval. Extensive experiments are conducted on real-world social media data with both landmark photos together with their user information to show the superior performance over the existing methods, especially our recently proposed multi-query based mid-level pattern representation method [1].

Semantic Retrieval of Personal Photos Using Matrix Factorization and Two-Layer Random Walk Fusing Sparse Speech Annotations with Visual Features

Semantic Retrieval Of Personal Photos Using A Deep Autoencoder Fusing Visual Features With Speech Annotations Represented As Word/Paragraph Vectors

Semantic Image Retrieval Based on Multiple-Instance Learning

Image Retrieval Based on Fuzzy Semantic Relevance Matrix

Bridging the Semantic Gap Between Image Contents and Tags

Semantic Reconstruction based on RGB Image and Sparse Depth

Distance Metric Learning from Uncertain Side Information with Application to Automated Photo Tagging

Modeling Image Data for Effective Indexing and Retrieval in Large General Image Databases.

Towards Semantic Embedding In Visual Vocabulary

Retargeting Semantically-Rich Photos

A face retrieval technique combining large models and artificial neural networks

Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models

Text-based Person Search in Full Images via Semantic-Driven Proposal Generation

A Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieval

Effective Multi-Query Expansions: Collaborative Deep Networks for Robust Landmark Retrieval

Flickr Image Community Analytics by Deep Noise-Refined Matrix Factorization.

AI-Based Semantic Multimedia Indexing and Retrieval for Social Media on Smartphones

Towards Multi-Semantic Image Annotation with Graph Regularized Exclusive Group Lasso

A Semantic-Based Method for Visualizing Large Image Collections.

An Efficient Approach Based on Image Pixel and Semantic Features Towards Video Retrieval

Image-text matching using multi-subspace joint representation