Detecting Text Similarity over Chinese Research Papers Using MapReduce

Fan Xu,Qiaoming Zhu,Peifeng Li
DOI: https://doi.org/10.1109/snpd.2011.29
2011-01-01
Abstract:This paper proposes a novel method to detect text similarity over Chinese research papers using MapReduce paradigm. Our approach differs from the state-of-the-art methods in two aspects. First, we extract the key sentences from Chinese research papers by using some heuristic features and then generate 2-tuple, (document id, key phrase), as the representation of the documents. Second, we design 2-phrase MapReduce algorithm to verify the effectiveness of the generated 2-tuple. For evaluation, we compare the proposed method with other approaches on synthetic corpus. Experimental results review that our method much outperforms the state-of-the-art ones on running time performance while guarantee the Jaccard similarity coefficient.
What problem does this paper attempt to address?