Improving Chinese Electronic Medical Record Retrieval by Field Weight Assignment, Negation Detection, and Re-ranking

Songchun Yang,Xiangwen Zheng,Yu Xiao,Xiangfei Yin,Jianfei Pang,Huajian Mao,Wei Wei,Wenqin Zhang,Yu Yang,Haifeng Xu,Mei Li,Dongsheng Zhao
DOI: https://doi.org/10.1016/j.jbi.2021.103836
IF: 8
2021-06-01
Journal of Biomedical Informatics
Abstract:<p>The technique of information retrieval has been widely used in electronic medical record (EMR) systems. It's a pity that most existing methods have not considered the structures and language features of Chinese EMRs, which affects the performance of retrieval. To improve accuracy and comprehensiveness, we propose an improved algorithm of Chinese EMR retrieval. First, the weights of fields in Chinese EMRs are assigned based on the corresponding importance in clinical applications. Second, negative relations in EMRs are detected, and the retrieval scores of negative terms are adjusted accordingly. Third, the retrieval results are re-ranked by expansion terms and time information to enhance the recall without decreasing precision. Experiment results show that the improved algorithm increases the precision and recall significantly, which shows that the algorithm takes a full account of the characteristics of Chinese EMRs and fits the needs for clinical applications.</p>
medical informatics,computer science, interdisciplinary applications
What problem does this paper attempt to address?
The paper aims to address several key issues in the retrieval of Chinese Electronic Medical Records (EMR) to improve the accuracy and comprehensiveness of the retrieval. Specifically: 1. **Field Weight Allocation**: Chinese EMRs contain multiple fields, and the importance of different fields varies in clinical applications. Most existing retrieval algorithms do not fully consider this, resulting in less reasonable retrieval results. 2. **Negation Detection**: Negation relationships (e.g., "no headache") frequently appear in Chinese EMRs, but most retrieval algorithms cannot distinguish negations, leading to mismatches. 3. **Query Drift Elimination**: Although Query Expansion (QE) can increase the comprehensiveness of results, it may introduce irrelevant expansion terms in highly specialized and diverse EMR documents, thereby reducing retrieval precision. To address the above issues, the authors propose an improved Chinese EMR retrieval algorithm, implemented through the following methods: - **Field Weight Allocation**: Determine the importance of different fields based on expert consultation and the Delphi method, and apply these weights in the retrieval algorithm. - **Negation Detection**: Design rules to identify negation relationships and adjust the retrieval scores of the corresponding items. - **Re-ranking**: Re-rank the initial retrieval results using expansion terms and temporal information extracted from pseudo-relevance feedback (PRF) to improve recall without reducing precision. Experimental results show that the improved algorithm significantly enhances retrieval precision and recall, indicating that the algorithm fully considers the characteristics of Chinese EMRs and meets the needs of clinical applications.