THE PLAGIARISM CHECKER USING MACHINE LEARNING
Dr. P Anuradha,
DOI: https://doi.org/10.55041/ijsrem18949
2023-04-14
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
Abstract:Plagiarism is one of the most increasing problems in many fields; the academic field is one of them. It comes in various forms, from replacing a word with its synonym to sentence modification, transformation, and many more. Though humans are known for their efficient working ability, they may not be able to detect plagiarism in all scenarios accurately. They can’t see similarities against more than one million online documents in seconds. So, concluding that plagiarism detection is tedious and time-consuming work for a human, it would be nice to have a plagiarism checker to do plagiarism detection for us. In this project, a plagiarism checker application is developed based on machine learning as its core that searches a vast database for plagiarized content. Using the vector embeddings concept, a plagiarism checker application is developed, which converts data, for example, text data, into a list of numbers, thus allowing various operations to be performed on the converted data. Vectors are helpful because when we present real-world entities like audio, images, text, etc., as vector embeddings, the semantic similarity between these entities can be quantified by how close they’re to each other as points in vectors. Models are trained to translate entities into vectors; NLP is commonly used for such training. These International Journal of Scientific Research in Engineering and Management (IJSREM) Volume: 07 Issue: 04 | April - 2023 Impact Factor: 8.176 ISSN: 2582-3930 © 2023, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM18949 | Page 2 vector embeddings will be added to the pre-processed database, which will then be ready to be used for our similarity check. The machine learning-based application will take text as input from the user, check the text against the database, and return all the articles from which the input text could be plagiarized, along with the match score. Keywords: Plagiarism detection, machine learning, vector embeddings, NLP.