A novel mixed approach for detecting overlap in document images

F. Jalali,A. Ebrahimi
DOI: https://doi.org/10.1109/IRANIANCEE.2017.7985324
2017-05-01
Abstract:Plagiarism detection is an important task in document processing. Due to increase in electrical resources, it is not possible to check documents for overlap by hand. In this paper, a new approach is proposed for overlap detection in document images. Proposed method divides a document image into three parts: text, background, and image. Then text and image parts are processed to find overlap individually. After retrieving the text by means of OCR algorithm, it is modeled using signature extraction method. Jaccard index, cosine similarity, and Euclidean distance are applied for finding similar images. For segmenting document image, a novel approach based on blocking and neighborhood analysis is designed. Proposed method for overlap detection is capable of defeating technical disguise problem which exists in most plagiarism detection systems.
What problem does this paper attempt to address?