Implementation of The Indonesian Language Stemming Algorithm in Twitter Data Preprocessing. Case Study: Twitter Wargabanua and Instakalsel

Afian Syafaadi Rizki,Nina Mia Aristi,Najamudin Ridha,Aidil Fajar Zulfahri,Dwi Agung Wibowo
DOI: https://doi.org/10.52005/fidelity.v5i3.170
2023-09-30
Abstract:Stemming is a widely used method in the field of Natural Language Processing (NLP). Its primary purpose is to normalize words with similar meanings but different forms into a common representation by converting them into their basic or root forms. Stemming is typically applied during the data preprocessing stage to enhance the performance of NLP systems. In the context of the Indonesian language, the Nazief stemming algorithm is the most commonly employed. This algorithm has been developed and adapted for various regional languages in Indonesia. In this research, we will assess the performance of the Nazief stemming algorithm on Twitter data from the accounts @wargabanua and @instakalsel. The goal is to evaluate how the algorithm handles text data that includes a mixture of two languages: Indonesian and Banjar. The test results indicate an accuracy rate of 90.34%. This demonstrates that the Nazief stemming algorithm can effectively process social media text data, even though it was not originally designed for the Banjar language.
What problem does this paper attempt to address?