Improved diagnosis-medication association mining to reduce pseudo-associations
Ching-Huan Wang,Phung Anh Nguyen,Yu Chuan Jack Li,Md Mohaimenul Islam,Tahmina Nasrin Poly,Quoc-Viet Tran,Chih-Wei Huang,Hsuan-Chia Yang,Yu Chuan (Jack) Li,Md. Mohaimenul Islam
DOI: https://doi.org/10.1016/j.cmpb.2021.106181
IF: 6.1
2021-08-01
Computer Methods and Programs in Biomedicine
Abstract:<h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Background and Objective</h3><p>: Association rule mining has been adopted to medical fields to discover prescribing patterns or relationships among diseases and/or medications; however, it has generated unreasonable associations among these entities. This study aims to identify the real-world profile of disease-medication (DM) associations using the modified mining algorithm and assess its performance in reducing DM pseudo-associations.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Methods</h3><p>: We retrieved data from outpatient records between January 2011 and December 2015 in claims databases maintained by the Health and Welfare Data Science Center, Ministry of Health and Welfare, Taiwan. The association rule mining's lift (Q-value) was adopted to quantify DM associations, referred to as Q<sub>1</sub> for the original algorithm and as Q<sub>2</sub> for the modified algorithm. One thousand DM pairs with positive Q<sub>1</sub>-values (<span class="math"><math>Q1+</math></span>) and negative or no Q<sub>2</sub>-values (<span class="math"><math>Q2−</math></span> or <span class="math"><math>Q2∅</math></span>) were selected as the validation dataset, in which two pharmacists assessed the DM associations.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Results</h3><p><strong>:</strong> A total of 3,120,449 unique DM pairs were identified, of which there were 333,347 <span class="math"><math>Q1+Q2−</math></span> pairs and 429,931 <span class="math"><math>Q1+Q2∅</math></span> pairs. <span class="math"><math>Q1+Q2−</math></span> rates were relatively high in ATC classes C (29.91%) and R (30.24%). Classes L (69.91%) and V (52.52%) demonstrated remarkably high <span class="math"><math>Q1+Q2∅</math></span> rates. For the 1000 pairs in the validation, 93.7% of the <span class="math"><math>Q1+Q2−</math></span> or <span class="math"><math>Q1+Q2∅</math></span> DM pairs were assessed as pseudo-associations. However, classes M (5.3%), H (4.5%), and B (4.1%) showed the highest rates of plausible associations falsely given <span class="math"><math>Q2−</math></span> or <span class="math"><math>Q2∅</math></span> by the modified algorithm.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Conclusions</h3><p><strong>:</strong> The modified algorithm demonstrated high accuracy to identify pseudo-associations regarded as positive associations by the original algorithm and would potentially be applied to improve secondary databases to facilitate research on real-world prescribing patterns and further enhance drug safety.</p>
engineering, biomedical,computer science, interdisciplinary applications,medical informatics, theory & methods