Learning Representations from Medical Text for Effective Diagnoses and Knowledge Discovery

Zhoujian Sun,Hanrui Shi,Zhengxing Huang,Nai Ding
DOI: https://doi.org/10.1109/embc40787.2023.10340797
2023-01-01
Abstract:Discovering knowledge and effectively predicting target events are two main goals of medical text mining. However, few models can achieve them simultaneously. In this study, we investigated the possibility of discovering knowledge and predicting diagnosis at once via raw medical text. We proposed the Enhanced Neural Topic Model (ENTM), a variant of the neural topic model, to learn interpretable representations. We introduced the auxiliary loss set to improve the effectiveness of learned representations. Then, we used learned representations to train a softmax regression model to predict target events. As each element in representations learned by the ENTM has an explicit semantic meaning, weights in softmax regression represent potential knowledge of whether an element is a significant factor in predicting diagnosis. We adopted two independent medical text datasets to evaluate our ENTM model. Results indicate that our model performed better than the latest pretrained neural language models. Meanwhile, analysis of model parameters indicates that our model has the potential discover knowledge from data.Clinical relevance— This work provides a model that can effectively predict patient diagnosis and has the potential to discover knowledge from medical text.
What problem does this paper attempt to address?