Turkish Medical Text Classification Using BERT

Azer Celikten,Hasan Bulut
DOI: https://doi.org/10.1109/siu53274.2021.9477847
2021-06-09
Abstract:Medical text classification is mostly carried out on English data sets. The limited number of studies in Turkish is due to the compelling morphological structure of Turkish for natural language processing and the limited number of data sets in the medical domain. In addition, the use of domain specific words and abbreviations makes natural language processing studies more challenging. In this study, a classification model is implemented to assign article abstracts to appropriate disease categories using multilingual BERT and BERTurk models on a data set consisting of Turkish medical article abstracts. As a result of the experimental study, 0.82 and 0.93 F-score are obtained for multilingual BERT and BERTurk, respectively. The results show that the BERTurk is more successful than other compared models for Turkish medical text classification.
What problem does this paper attempt to address?