Machine Learning Model for Language Classification: Bag-of-words and Multilayer Perceptron

Devi Hawana Lubis,Sawaluddin Sawaluddin,Ade Candra
DOI: https://doi.org/10.31289/jite.v7i1.10114
2023-07-28
JOURNAL OF INFORMATICS AND TELECOMMUNICATION ENGINEERING
Abstract:The availability of data today has become a great asset for research that is used for various purposes such as for machine learning. One of the basic machine learning methods for natural language processing is bag-of-words. The problem in this study is the difficulty in classifying texts because texts still have unstructured characteristics, so this study will apply a model to classify the language of texts. Texts will be placed in four categories, English, Indonesian, German and French. Research was conducted using Bag-of-words and Multilayer Perceptron to solve this supervised machine learning problem. The use of Bag-of-words to perform text representation for simple patterns, easy processing and good performance. On the other hand, a multilayer perceptron has the ability to study complex data patterns in the form of images, text or videos. This study will collect data using text mining techniques, namely crawling Twitter social media as many as 4000 data records. This study produces a model with an accuracy of 98 percent with a loss of 0.14 percent which shows good model performance in classifying languages based on text data.
What problem does this paper attempt to address?