Video hash learning based on feature fusion and Manhattan quantization

Xiushan Nie,Shuting Wang,Yilong Yin
DOI: https://doi.org/10.13232/j.cnki.jnju.2016.04.015
2016-01-01
Abstract:With the development of computer and multimedia technologies,video storage,transmission and retrieval are facing a huge challenge in the Internet especially the mobile Internet,due to the complex structure and high dimension of the video.Video hash learning is one of the important ways to solve the challenge,and it becomes one of the hot topics in the field of multimedia processing.As known,the existing methods generate video hashes using different types of features.In fact,there are potential relationships among different types of video features. Therefore,to make full use of the relationships among different video features and overcome the limitations of traditional video hashing methods,we proposed a video hash learning method based on feature fusion and Manhattan quantization in this paper.In the proposed method,the global,local and temporal features are firstly extracted from the video content,and the video clip is considered as a third-order tensor.Then,the tensor decomposition,which is popularly applied in multi-dimensional data processing,is used to fuse the global,local and temporal features.The three low-order tensors are obtained after tensor decomposition,and we concatenate them as the fusion representation of video content.Subsequently,the fused video feature is quantified by Manhattan quantization to get the video hash codes,which are used to construct the final video hash.Compared with the traditional video hashing methods,the proposed method not only makes full use the relationship among different video features,but also achieves the goal of coding with different dimensions respectively,which can well preserve the structural similarity among different video features.Two kinds of experiments are conducted to evaluate the performance of the proposed method,and the results show that the proposed method has a good performance compared with the existing methods.
What problem does this paper attempt to address?