An efficient and scalable plagiarism checking system using Bloom filters

Shahabeddin Geravand,Mahmood Ahmadi
DOI: https://doi.org/10.1016/j.compeleceng.2014.06.003
2014-08-01
Abstract:With the easy access to the huge volume of articles available on the Internet, plagiarism is getting worse and worse. Most recent approaches proposed to address this problem usually focus on achieving better accuracy of similarity detection process. However, there are some real applications where plagiarized contents should be detected without revealing any information. Moreover, in such web-based applications, running time, memory consumption, communication and computational complexity should be also taken into account. In this paper, we propose a similar document detection system based on matrix Bloom filter, a new extension of standard Bloom filter. The experimental results on a real dataset show that the system can achieve 98% of accuracy. We also compare our approach with a method recently proposed for the same purpose. The results of the comparison show that the Bloom filter-based approach achieves much better performance than other in terms of the aforementioned factors.
engineering, electrical & electronic,computer science, interdisciplinary applications, hardware & architecture
What problem does this paper attempt to address?