Abstract:Plagiarism is one of the most increasing problems in many fields; the academic field is one of them. It comes in various forms, from replacing a word with its synonym to sentence modification, transformation, and many more. Though humans are known for their efficient working ability, they may not be able to detect plagiarism in all scenarios accurately. They can’t see similarities against more than one million online documents in seconds. So, concluding that plagiarism detection is tedious and time-consuming work for a human, it would be nice to have a plagiarism checker to do plagiarism detection for us. In this project, a plagiarism checker application is developed based on machine learning as its core that searches a vast database for plagiarized content. Using the vector embeddings concept, a plagiarism checker application is developed, which converts data, for example, text data, into a list of numbers, thus allowing various operations to be performed on the converted data. Vectors are helpful because when we present real-world entities like audio, images, text, etc., as vector embeddings, the semantic similarity between these entities can be quantified by how close they’re to each other as points in vectors. Models are trained to translate entities into vectors; NLP is commonly used for such training. These International Journal of Scientific Research in Engineering and Management (IJSREM) Volume: 07 Issue: 04 | April - 2023 Impact Factor: 8.176 ISSN: 2582-3930 © 2023, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM18949 | Page 2 vector embeddings will be added to the pre-processed database, which will then be ready to be used for our similarity check. The machine learning-based application will take text as input from the user, check the text against the database, and return all the articles from which the input text could be plagiarized, along with the match score. Keywords: Plagiarism detection, machine learning, vector embeddings, NLP.

Plagiarism Detection Using the Levenshtein Distance and Smith-Waterman Algorithm

Finding Plagiarism Based on Common Semantic Sequence Model

A Plagiarism Detection Method Based on Semantic Matching

The Study of Plagiarism Detection for Program Code

A Hybrid Method for Detecting Source-code Plagiarism in Computer Programming Courses

Semantic Sequence Kin: A Method of Document Copy Detection

An Intelligent Approach for Semantic Plagiarism Detection in Scientific Papers

Plagiarism Detection using ROUGE and WordNet

Plagiarism Detection on Electronic Text based Assignments using Vector Space Model (ICIAfS14)

A Plagiarism Detection Approach for Chinese Documents Based on Semantic Textual Similarity

A Novel Plagiarism Detection Approach Combining BERT-based Word Embedding, Attention-based LSTMs and an Improved Differential Evolution Algorithm

Features Based Text Similarity Detection

Plagiarism Detection Using Machine Learning

Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations

Detection of Plagiarism in Students' Programs Using a Data Mining Algorithm

Automatic Detection of Plagiarism in Writing

THE PLAGIARISM CHECKER USING MACHINE LEARNING

Analyzing Non-Textual Content Elements to Detect Academic Plagiarism

An effective text plagiarism detection system based on feature selection and SVM techniques

Methods for Detecting Paraphrase Plagiarism

A Sampling-based Tool for Plagiarism Detection in Student Texts