A Novel and Effective Multi-Class Classification Method for Imbalanced Medical Transcriptions

Priti Bhardwaj,Niyati Baliyan
DOI: https://doi.org/10.1080/03772063.2024.2304002
IF: 1.8768
2024-01-20
IETE Journal of Research
Abstract:Natural Language Processing (NLP) techniques are used to glean information from Electronic Health Records (EHR) for identifying patients with unique clinical characteristics and defining phenotypes. The classification of imbalanced datasets is also one of the vital concerns in medical diagnosis. We built an improved framework for automating the multi-class classification of imbalanced medical transcriptions [ Citation 1 ] into 40 medical specialties, by creating a set of important phenotypes/features. We implemented and tested five machine learning models out of which Random Forest Classifier has achieved the highest performance of 0.99 F1 score (precision 0.99, recall 0.99) and roc-auc-score of 0.99 on test data.
telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?