Cross-modal correlation learning for clustering on image-audio dataset.

Hong Zhang,Yueting Zhuang,Fei Wu
DOI: https://doi.org/10.1145/1291233.1291290
2007-01-01
Abstract:It is interesting and challenging to explore correlations between different datasets and utilize such correlations for the clustering on these datasets. Cross-modal correlation between images and audios can help identify images (or audios) of certain semantics. However, the heterogeneous problem makes it difficult to learn cross-modal correlation between visual and auditory features. In this paper, we analyze canonical correlation between feature matrices of images and audios during subspace mapping; then we design correlation-based similarity reinforcement for images and audios; thirdly we implement image clustering and audio clustering with affinity propagation. Experiment results on image-audio dataset are encouraging and show that the performance of our approach is effective. We give an interesting application of querying images by audio examples.
What problem does this paper attempt to address?