Medical Record Text Analysis Based on Latent Semantic Analysis

Xinyu Jin,Wentao Ma,Yunze Li
DOI: https://doi.org/10.1109/iscid.2015.155
2015-01-01
Abstract:With the rapid development of e-medical industry, large amount of data is accumulated in various data bases in the form of text. A considerable part of these data is manually recorded and thus is unstructured, which need semantic analysis. This paper proposed an improved Latent Dirichlet Allocation model based on BM25 mixture weights method aiming to analyze Chinese patient medical record on latent semantics. Test shows a conspicuous reduction on perplexity, thus proving its effectiveness.
What problem does this paper attempt to address?