Comparison of Transfer Learning and Traditional Machine Learning Approach for Text Classification

J. Vashishtha,K. Taneja
DOI: https://doi.org/10.23919/INDIACom54597.2022.9763279
2022-03-23
Abstract:In the era of big data where data consist of images, text, audio etc., manually analysing text is a big challenge. Over a long time, research in Natural Language Processing (NLP) has relied upon rule based approach and machine learning approach but these approaches need domain expert to build only task specific NLP systems. However, the recent research in NLP has given rise to pre-trained language models like ELMo, GPT, BERT, DistilBERT etc. which have made transfer learning as state-of-the-art technique in NLP. In this paper, we present the comparison of transfer learning and traditional machine learning approach for text classification. To achieve our goal, we have fine-tuned BERT and DistilBERT for text classification to compare transfer learning approach against the traditional machine learning approach based on Term Frequency-Inverse Document Frequency (TF-IDF) algorithm for feature extraction. The performance of all models is empirically tested on suite of different datasets. The results obtained by fine-tuning pre-trained language models show significant improvement over traditional machine learning approach for text classification. This research study also promotes the use of DistilBERT over BERT as default transfer learning model for various text classification applications because it can be trained in less time with less number of parameters to give similar performance like BERT.
Computer Science
What problem does this paper attempt to address?