Using Earth Mover'S Distance for Audio Clip Retrieval

Yuxin Peng,Cuihua Fang,Xiaoou Chen
2006-01-01
Abstract:This paper presents a new approach for audio clip retrieval based on Earth Mover's Distance (EMD). Instead of using frame-based or salient-based features in most existing methods, our approach propose a segment-based representation, and allows many-to-many matching among audio segments for the clip similarity measure, which is capable of tolerating errors due to audio segmentation and various audio effects. We formulate audio clip retrieval as a graph matching problem in two stages. In the first stage, segment-based feature is employed to represent the audio clips, which can not only capture the change property of audio clip, but also keep and present the change relation and temporal order of audio features. In the second stage, based on the result of the segment similarity measure, a weighted graph is constructed to model the similarity between two clips. EMD is proposed to compute the minimum cost of the weighted graph as the similarity value between two audio clips. Experimental results show that the proposed approach is better than some existing methods in terms of retrieval and ranking capabilities.
What problem does this paper attempt to address?