Clinical multi-label free text classification by exploiting disease label relation

Rui-Wei Zhao,Guo-Zheng Li,Jia-Ming Liu,Xiao Wang
DOI: https://doi.org/10.1109/BIBM.2013.6732508
2013-01-01
Abstract:Clinical data describing a patient's health status can be multi-labelled. For example, a clinical record describing patient suffering from cough and fever should be tagged with both two disease labels. These co-occurred labels often have interrelation which can be exploited to improve disease classifications. In this work, we treat the categorization of free clinical text as a multi-label learning problem. However, we discover that some commonly used multi-label learning methods might suffer from some severe side effects in exploiting complicated disease label relation, such as over-exploitation of label relation and error-propagation in label prediction. Based on these findings, we propose a novel multi-label learning algorithm called Ensemble of Sampled Classifier Chains (ESCC) to improve clinical text data classification. ESCC automatically learns to select relevant disease information that is helpful to improve classification performance when exploiting possible disease relation. In our conducted experiments, ESCC shows strong advantages over other state-of-the-art multi-label algorithms on medical text data with significant improvement in performance. The proposed algorithm is promising in mining knowledge from a wide range of multi-label medical text data.
What problem does this paper attempt to address?