Graph Sparse Learning in Multimedia Data
Xiaofeng Zhu
2013-01-01
Abstract:nnn Sparse coding, originally applied to model the human visual cortex, has been shown useful for many applications, such as image analysis and signal processing. However, existing sparse learning models (including sparse coding, group sparse coding, joint sparse coding, etc) do not fully exploit the geometric structure of the data. Research has shown that many kinds of data (such as natural images, audio data, etc) can be characterized geometrically. Moreover, research has also shown that the geometric information of the data is important for learning from multimedia data. Recently the graph sparse coding model has been proposed by cascading a graph Laplacian regularizer into the sparse coding model for classification and clustering, showing a better performance than the state-of-the-art sparse coding models that do not consider the graph Laplacian regularizer in multimedia applications. In this thesis, we study three research problems for different applications, all of which are largely based on sparse learning with a graph Laplacian regularizer, shorted as graph sparse learning model.nnn In the first problem, we aim to improve the performance of unsupervised hashing by proposing a novel sparse hashing method. A graph Laplacian regularizer is added into the sparse coding model to form a non-negative sparse coding model, aiming at preserving the geometric structure of original data as much as possible in the Hamming space. Specifically, we first employ our proposed model to convert the original high-dimensional data into low-dimensional data as well as devise a new optimization method to solve the derived objective function. Second, a new binarization rule is designed to convert the derived low-dimensional representation of the original high-dimensional data into binary codes. Furthermore, we learn the hash functions to efficiently and effectively predict unseen data, making the best use of prior knowledge of the original data. Finally, one can conduct approximate nearest neighbor search in the memory of a PC via calculating the Hammingdistance between the query and training data.nnn In the second problem, we focus on automatical shot tagging given a collection of videos with the video-level tags. To this end, we propose a new sparse learning model embedding the geometric information of training data into the sparse group lasso framework, aiming at effectively discovering the correlation between the test shot and training videos. In our solution, we first search for the correlation between the visual feature of the test shot and the training videos, through embedding the geometric information of training videos (i.e., a graph Laplacian regularizer) and other information (e.g., the intra-group sparsity constraint and the inter-group sparsity constraint) into the existing sparse coding model to form our objective function, which is solved by devising a new optimization method. We then propose a new tagging propagation rule to assign the videolevel tags of the training videos to the test shot by the learnt correlation between the test shot and training videos. nnn nnn In the third problem, we focus on performing unsupervised dimensionality reduction on the high-dimensional yet small-sized target data. We utilize external data and the geometric information embedded the original data to improve the performance. To this end, we first learn the bases of small-sized target data from sufficient external data, and then represent (or reconstruct) them by the learnt bases via designing a new graph sparse learning model. In the proposed model, we add the graph Laplacian regularizer into the joint sparse coding model to form our objective function, which is solved by designing a novel optimization method. With the proposed model, we obtain new compact representation of then high-dimensional yet small-sized target data, which preserves the geometric structure of the target data.