Abstract:Bag-of-visual Words (BoWs) representation has been applied for various problems in the fields of multimedia and computer vision. The basic idea is to represent images as visual documents composed of repeatable and distinctive visual elements, which are comparable to the text words. Notwithstanding its great success and wide adoption, visual vocabulary created from single-image local descriptors is often shown to be not as effective as desired. In this paper, descriptive visual words (DVWs) and descriptive visual phrases (DVPs) are proposed as the visual correspondences to text words and phrases, where visual phrases refer to the frequently co-occurring visual word pairs. Since images are the carriers of visual objects and scenes, a descriptive visual element set can be composed by the visual words and their combinations which are effective in representing certain visual objects or scenes. Based on this idea, a general framework is proposed for generating DVWs and DVPs for image applications. In a large-scale image database containing 1506 object and scene categories, the visual words and visual word pairs descriptive to certain objects or scenes are identified and collected as the DVWs and DVPs. Experiments show that the DVWs and DVPs are informative and descriptive and, thus, are more comparable with the text words than the classic visual words. We apply the identified DVWs and DVPs in several applications including large-scale near-duplicated image retrieval, image search re-ranking, and object recognition. The combination of DVW and DVP performs better than the state of the art in large-scale near-duplicated image retrieval in terms of accuracy, efficiency and memory consumption. The proposed image search re-ranking algorithm: DWPRank outperforms the state-of-the-art algorithm by 12.4% in mean average precision and about 11 times faster in efficiency.

Allocating images and selecting image collections for distributed visual search

Image retrieval based on incremental subspace learning

Learning to Distribute Vocabulary Indexing for Scalable Visual Search

Distributed Architecture for Large Scale Image-Based Search

Modeling Image Data for Effective Indexing and Retrieval in Large General Image Databases.

Efficient Indexing for Large Scale Visual Search

Evaluating Inverted Files for Visual Compact Codes on a Large Scale

Fast Object Retrieval Using Direct Spatial Matching

Modeling spatial and semantic cues for large-scale near-duplicated image retrieval

Predicting The Effectiveness Of Queries For Visual Search

Fine-Grained Image Search

Multi-Scale Visual Words For Object-Based Web Image Search

Cascade Category-Aware Visual Search

Database Saliency for Fast Image Retrieval

Bridging the Gap between Local Semantic Concepts and Bag of Visual Words for Natural Scene Image Retrieval

Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval

Enhancing Remote Sensing Image Retrieval: A Hierarchical Approach Integrating Visual and Semantic Similarities

Generating descriptive visual words and visual phrases for large-scale image applications

Direction-Oriented Visual-semantic Embedding Model for Remote Sensing Image-text Retrieval

Scale-Semantic Joint Decoupling Network for Image-text Retrieval in Remote Sensing

Semantic-rebased cross-modal hashing for scalable unsupervised text-visual retrieval