BERT-Enhanced Retrieval Tool for Homework Plagiarism Detection System

Jiarong Xian,Jibao Yuan,Peiwei Zheng,Dexian Chen,Nie yuntao

2024-07-28

Abstract:Text plagiarism detection task is a common natural language processing task that aims to detect whether a given text contains plagiarism or copying from other texts. In existing research, detection of high level plagiarism is still a challenge due to the lack of high quality datasets. In this paper, we propose a plagiarized text data generation method based on GPT-3.5, which produces 32,927 pairs of text plagiarism detection datasets covering a wide range of plagiarism methods, bridging the gap in this part of research. Meanwhile, we propose a plagiarism identification method based on Faiss with BERT with high efficiency and high accuracy. Our experiments show that the performance of this model outperforms other models in several metrics, including 98.86\%, 98.90%, 98.86%, and 0.9888 for Accuracy, Precision, Recall, and F1 Score, respectively. At the end, we also provide a user-friendly demo platform that allows users to upload a text library and intuitively participate in the plagiarism analysis.

Computation and Language,Artificial Intelligence,Information Retrieval

What problem does this paper attempt to address?

The paper primarily addresses two key issues in assignment plagiarism detection systems: 1. **Construction of High-Quality Plagiarism Dataset**: In existing plagiarism detection research, detecting high-level plagiarism remains a challenge mainly due to the lack of high-quality datasets. The authors propose a plagiarism text data generation method based on GPT-3.5, resulting in a dataset of 32,927 pairs of text plagiarism detection data that includes various plagiarism methods, filling a research gap in this field. 2. **Efficient Plagiarism Identification Method**: For the dataset constructed above, the authors also propose a highly efficient and accurate plagiarism identification method combining Faiss and BERT. Experimental results show that the model performs excellently on multiple metrics such as Accuracy, Precision, Recall, and F1 score, achieving 98.86%, 98.90%, 98.86%, and 0.9888, respectively. Additionally, the paper provides a user-friendly demonstration platform that allows users to upload text libraries and intuitively participate in the plagiarism analysis process. Overall, this study not only addresses the issue of constructing high-quality datasets but also proposes an efficient plagiarism detection solution and validates its effectiveness through experiments.

BERT-Enhanced Retrieval Tool for Homework Plagiarism Detection System

A Novel Plagiarism Detection Approach Combining BERT-based Word Embedding, Attention-based LSTMs and an Improved Differential Evolution Algorithm

Automatic Detection of Plagiarism in Writing

NLP based Deep Learning Approach for Plagiarism Detection

PlagBench: Exploring the Duality of Large Language Models in Plagiarism Generation and Detection

A Simple and Effective Method of Cross-Lingual Plagiarism Detection

Plagiarism Detection using ROUGE and WordNet

Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level

Deep Investigation of Cross-Language Plagiarism Detection Methods

Plagiarism Detection in Computer Programming Using Feature Extraction From Ultra-Fine-Grained Repositories

A Sampling-based Tool for Plagiarism Detection in Student Texts

A Coarse-to-fine Framework to Efficiently Thwart Plagiarism

Paraphrase Identification with Deep Learning: A Review of Datasets and Methods

Plagiarism Detection Methods and Tools: An Overview

Identifying Machine-Paraphrased Plagiarism

HawkEyes Plagiarism Detection System

Improving Plagiarism Detection in Coding Assignments by Dynamic Removal of Common Ground

Will ChatGPT get you caught? Rethinking of Plagiarism Detection

PDMTT: A Plagiarism Detection Model Towards Multi-turn Text Back-Translation

Plagiarism Detection Using Machine Learning

Research on MLChecker Plagiarism Detection System