Effective Hashing for Searching Large-scale Multimedia Databases
Jingkuan Song
2014-01-01
Abstract:During the last decade, multimedia databases have become increasingly important in many research areas such as multimedia search, computer vision, pattern recognition, social media analysis, data management, information systems and medical imaging. The big volume, velocity, variety and high complexity of multimedia data require effective indexing to organize and manage such large-scale datasets to facilitate fast access and accurate retrieval. As a result, there has been significant research attention to develop indexing techniques to expedite multimedia similarity search and retrieval. Among the indexing methods, hashing has shown its promising performance due to its high efficiency in terms of both storage and computational cost. However, a vast number of previously proposed hashing approaches make use randomized algorithms to generate hash codes without considering the knowledge in the data. Intuitively, exploiting the distribution information of the data can improve the hashing performance. Moreover, while lots of research efforts have been made to design hashing functions on single feature and single media type, there have been very few attempts to utilize multiple features of the multimedia data, or to enable inter-media retrieval on different media types from heterogeneous data sources. Also, even though computing the Hamming distance between pairs of binary codes can be implemented efficiently, exhaustive search on large-scale binary code databases could be impractical. This thesis is focusing on designing more effective hashing methods and conducting efficient search from the derived binary code databases to support large-scale multimedia retrieval. It consists of the following four phases. Firstly we intended to design a machine learning based hashing method to improve the performance of multimedia similarity search. The method is based on local structure reconstruction, and the solution is based on eigen value decomposition. After that, we exploited to apply local-model-based hashing methods to multiple types of features and different media types. These two methods have similar solution techniques to the first method, because they also rely on eigen value decomposition. Lastly, based on the observation that sequential scan on the binary code is still time- and memory-consuming, we designed a new indexing method on the binary codes for efficient binary code retrieval. In terms of application, we have addressed hashing problems in different environments, including near duplicate video retrieval, cross-media retrieval and binary codes retrieval. More specifically: The first phase of this thesis aims to design Robust Hashing with Local Models (RHLM) to improve the hashing performance by exploiting the local structure of the data. The RHLM learns a set of robust hash functions to map the high-dimensional data points into binary hash codes by effectively utilizing local structural information. For each individual data point in the training dataset, a local hashing model is learned and used to predict hash codes of its neighboring data points. The local models from all the data points are globally aligned so that an optimal hash code can be assigned to each data point. After all the hash codes of the training data points are obtained, l2,1-norm minimization on the loss function is employed to learn effective hash functions. Given a query data point, the search process firstly maps the query into its query hash code using hash functions and then explores the buckets which have similar hash codes to the query hash code. The second phase of this thesis aims to design a Multiple Features Hashing (MFH) algorithm to improve the accuracy of traditional single feature based hashing methods by incorporating multiple features into hashing. The MFH preserves the local structure information of each individual feature and globally considers the local structures for all the features to learn a group of hash functions. Those hash functions map the data points into the Hamming space and generate a series of binary codes to represent the dataset. Then, these generated binary codes can be utilized to conduct efficient search. The third phase of this thesis aims to enable large-scale inter-media retrieval. Inter-media Hashing (IMH) is proposed to explore the correlations among multiple media types from different data sources and tackle the scalability issue. In IMH, multimedia data from heterogeneous data sources are transformed into a common Hamming space, in which fast search can be easily implemented by XOR and bit-count operations. In addition, a linear regression model is designed to enable the prediction of hashing codes for unseen data points. The fourth and final phase of this thesis aims to design a Distance-Computation-Free Search (DCFSearch) approach for efficient binary code search in the Hamming space. The binary codes are advantageous as they are compact and efficiently compared. However, exhaustive search on the whole binary code database can still be slow and hence impractical in the case of large-scale datasets. A new Hamming distance search scheme is proposed, which is free of Hamming distance computations to return the exact results. Without the necessity to compare database binary codes with queries, the search performance can be improved and databases can be externally maintained. More specifically, the inverted multi-index data structure is adopted to index binary codes. Importantly, the Hamming distance information embedded in the structure is utilized in the designed search scheme such that the verification of exact results no longer relies on Hamming distance computations. As a step further, the performance of the inverted multi-index structure is optimized by taking the code distributions among different bits into account for index construction.n