Abstract:The rapid growth of clinical text data requires accurate and powerful automated classification methods to support medical decision making and personalized healthcare. The multi-label classification task for clinical texts is designed to assign the most relevant set of labels to each clinical text. However, this task presents two significant challenges: (1) how to accurately extract fine-grained semantic features from complex clinical texts, and (2) how to effectively mitigate the issue of label class imbalance. To overcome these problems, we innovatively propose a novel Multi-label Classification of Imbalanced Clinical Text (MCICIT) model. In order to obtain fine-grained semantic features from clinical texts, we utilize the specialized pre-trained language model BioBERT, tailored for biomedical texts. To tackle the challenge of label class imbalance, we present a Co-occurrence Based and Embeddings with Additional Information Enhanced Graph Convolutional Network (CoEAI-GCN) module. On one hand, we enrich the label content by incorporating additional information to acquire more accurate word embeddings as the feature matrix. On the other hand, we combine the co-occurrence relationship of labels to construct a correlation matrix. Ultimately, label representations are learned through a graph convolutional network. By conducting multi-label classification experiments on two clinical text datasets extracted from real medical systems, our model achieves a 3.2% and 0.5% improvement in F1 scores, respectively, compared to state-of-the-art deep learning models. Additionally, we conduct ablation studies to explore the behaviors of the proposed model. These results together demonstrate that our proposed MCICT effectively enhances the classification performance of imbalanced clinical texts.

A Multi-Label Chinese Text Categorization System Based on Boosting Algorithm

Design and Implementation of a Multi-Label Chinese Text Categorization System

Multi-labeled Chinese Text Categorization Based on the Boosting Algorithms

AdaBoost-based Multi-attribute Classification Technology and Its Application

Cross-Domain Learning Based Traditional Chinese Medicine Medical Record Classification.

Incorporating Prior Knowledge Into Multi-Label Boosting For Cross-Modal Image Annotation And Retrieval

Learning outliers to refine a corpus for chinese webpage categorization

Stable Multi-Label Boosting for Image Annotation with Structural Feature Selection.

Learning Effective Features for Chinese Text Categorization

Traditional Chinese Medicine Clinical Records Classification Using Knowledge-Powered Document Embedding

A syndrome differentiation model of TCM based on multi-label deep forest using biomedical text mining

An Improved Double Channel Long Short-Term Memory Model for Medical Text Classification

Muli-label Text Categorization with Hidden Components.

Learning Semantic Similarity For Multi-Label Text Categorization

A High Performance Two-Class Chinese Text Categorization Method

MCICT: Graph convolutional network-based end-to-end model for multi-label classification of imbalanced clinical text

Traditional Chinese medicine clinical records classification with BERT and domain specific corpora

Chinese Text Categorization Based On The Binary Weighting Model With Non-Binary Smoothing

Hierarchical Multi-Label Text Categorization with Global Margin Maximization

Many-Class Text Classification with Matching

A class-feature-centroid classifier for text categorization