Technical Report on A Generic Multi-Dimensional Data Generator for Earth Mover ’ s Distance Similarity Analysis

Rui Zhang,Jin Huang,Jin Huang
2014-01-01
Abstract:Earth Mover’s Distance based Similarity Analysis (EMDSA) is an important and effective tool in many multimedia retrieval and pattern recognition applications. Currently there is no benchmark or publicly available datasets for evaluating EMDSA techniques. We would like to share a large-scale image feature generator we have designed and implemented for evaluating EMDSA techniques. The generator supports a wide range of of image features (16 commonly used features are embedded in). It provides flexible execution options: (i) it may run on a single machine or on a Hadoop cluster; (ii) the source image data may reside locally in a single machine, in a Hadoop file system, or on the Internet as URLs. We have made the source code of our generator and dozens of pre-generated datasets available online.
What problem does this paper attempt to address?