Clustering and retrieval of video shots based on natural stimulus fMRI

Junwei Han,Xiang Ji,Xintao Hu,Jungong Han,Tianming Liu
DOI: https://doi.org/10.1016/j.neucom.2013.11.052
IF: 6
2014-01-01
Neurocomputing
Abstract:Functional magnetic resonance imaging (fMRI) is a powerful tool to probe the human [email protected]?s perception and cognition. Besides being extensively exploited in the clinical applications, fMRI technique is also useful to [email protected]?s ordinary life. In this paper, we investigate a novel application of leveraging fMRI techniques to video clustering and retrieval. In the proposed work, we successfully integrate semantic human-centric features derived from natural stimulus fMRI data and low-level visual-audio features to facilitate video clustering and retrieval, which is a significant innovation compared to the previous works relying on either fMRI-derived features or low-level visual-audio features. Our system consists of several algorithmic modules. First, fMRI data when the subjects are watching video shot samples are acquired. Then a newly developed brain networks localization system is employed to locate the cortical regions of interests (ROIs) for each individual subject. The functional interactions computed by wavelet transform coherence are quantified, from which the human-centric features are derived. Afterwards, the Gaussian process regression model mapping visual-audio feature space to an fMRI-derived feature space is trained, given the training samples. The trained model is then adopted to predict fMRI-derived features for videos without the fMRI data. Finally, the multi-modal spectral clustering and multi-modal ranking algorithm are adopted and proposed to integrate these two heterogeneous features for video clustering and retrieval, respectively. Our experiment on TRECVID database has demonstrated the precision of video clustering and retrieval can be substantially improved by integration of visual-audio features and fMRI-derived features.
What problem does this paper attempt to address?