TextRefine: A Novel approach to improve the accuracy of LLM Models

Ekta Dalal,Parvinder Singh
DOI: https://doi.org/10.56294/dm2024331
2024-01-01
Abstract:Natural Language Processing (NLP) is an interdisciplinary field that investigates the fascinating world of human language with the goal of creating computational models and algorithms that can comprehend, produce, and analyze natural language in a way that is similar to humans. LLMs still encounter issues with loud and unpolished input material despite their outstanding performance in natural language processing tasks. TextRefine offers a thorough pretreatment pipeline that refines and cleans the text data before using it in LLMs to overcome this problem . The pipeline includes a number of actions, such as removing social tags, normalizing whitespace, changing all lowercase letters to uppercase, removing stopwords, fixing Unicode issues, contraction unpacking, removing punctuation and accents, and text cleanup. These procedures work together to strengthen the integrity and quality of the input data, which will ultimately improve the efficiency and precision of LLMs. Extensive testing and comparisons with standard techniques show TextRefine's effectiveness with 99 % of the accuracy
What problem does this paper attempt to address?