Distributed representations of diseases based on co-occurrence relationship

Haoqing Wang,Huiyu Mai,Zhi-hong Deng,Chao Yang,Luxia Zhang,Huai-yu Wang
DOI: https://doi.org/10.1016/j.eswa.2021.115418
IF: 8.5
2021-11-01
Expert Systems with Applications
Abstract:<p>The co-occurrence relationship among diseases facilitates the knowledge discovery in the medical field. However, due to limited data, previous researches are mainly based on clinician experience and simple statistics which make it difficult to discover deep associations among diseases. Treating the diagnoses in an electronic medical record (EMR) as interrelated random variables, we use Markov random fields to model the co-occurrence relationship among diseases and propose Di2Vec to learn distributed representations of diseases. The diseases having high co-occurrence frequency will be very close to each other in the embedding space. Considering the hierarchical structure in each diagnosis code, we introduce the subword embedding and explore its impact on the quality of embeddings, where the embedding of each diagnosis is expressed as the sum of its subword embedding. Qualitative and Quantitative experiments show that our Di2Vec can make the embeddings of diseases with high co-occurrence frequency close to each other, and can also outperform Skip-gram and CBOW when use these embeddings as the feature representations for medical expense prediction. Using subword embedding will make the disease embeddings to have better clustering property, but to a certain extent, it loss the co-occurrence information contained in the disease embeddings.</p>
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?