Plagiarism Detection Using the Levenshtein Distance and Smith-Waterman Algorithm

Zhan Su,Byung-Ryul Ahn,Ki-Yol Eom,Min-Koo Kang,Jin-Pyung Kim,Moon-Kyun Kim
DOI: https://doi.org/10.1109/icicic.2008.422
2008-01-01
Abstract:Plagiarism in texts is issues of increasing concern to the academic community. Now most common text plagiarism occurs by making a variety of minor alterations that include the insertion, deletion, or substitution of words. Such simple changes, however, require excessive string comparisons. In this paper, we present a hybrid plagiarism detection method. We investigate the use of a diagonal line, which is derived from Levenshtein distance, and simplified SmithWaterman algorithm that is a classical tool in the identification and quantification of local similarities in biological sequences, with a view to the application in the plagiarism detection. Our approach avoids globally involved string comparisons and considers psychological factors, which can yield significant speed-up by experiment results. Based on the results, we indicate the practicality of such improvement using Levenshtein distance and Smith-Waterman algorithm and to illustrate the efficiency gains. In the future, it would be interesting to explore appropriate heuristics in the area of text comparison
What problem does this paper attempt to address?