Privacy Protection of Sexually Transmitted Infections Information from Chinese Electronic Medical Records

Mengchun Gong,Yue Yu,Zihao OuYang,Wenzhao Shi,Chao Liu,Qilin Wang,Jiale Nan,Endi Cai,Fen Ding,Sheng Nie
DOI: https://doi.org/10.1101/2024.08.13.24311908
2024-08-27
Abstract:Objectives: To formulate an efficacious approach for safeguarding the privacy information of electronic medical records. Design: Chinese patient electronic medical record text information. Setting: The Chinese Renal Disease Data System database. Participants: 3,233,174 patients between 1 Jan. 2010 and 31 Dec. 2023. Main outcome measures: Annotated patient privacy fields and the effectiveness of privacy protection Results: We have developed an automated tool named EPSTII, designed to protect the privacy of patients' sexually transmitted infection information within medical records. Through the refinement of keywords and the integration of expert knowledge, EPSTII currently achieves a 100% accuracy and recall rate. Our privacy protection measures have reached a 99.5% success rate, ensuring the utmost protection of STI patients' privacy. As the first large-scale investigation into privacy leakage and STI identification in Chinese electronic medical records, our research paves the way for the future development of patient privacy protection laws in China and the advancement of more sophisticated tools. Conclusions:The EPSTII method demonstrates a feasible and effective approach to protect privacy in electronic medical records from 19 hospitals, offering comprehensive insights for infectious disease research using Chinese electronic medical records, with protocols tailored for accurate STI data extraction and enhanced protection compared to traditional methods.
Health Economics
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the issue of privacy protection for sexually transmitted infection (STI) information in Chinese electronic medical records (EMR). Specifically, the research goal is to develop an effective method to protect the privacy of patients' STI information in medical records. ### Background and Motivation 1. **Proliferation of Electronic Medical Records**: The widespread application of electronic medical records has brought multiple benefits, including improved medical quality, reduced medical errors, and lower costs. However, the highly sensitive nature of medical information means that any privacy breach could cause direct or indirect harm to patients. 2. **Risk of Privacy Breach**: With the increased use of electronic medical records, the risk of privacy breaches is also rising, including the leakage of prescription records, diagnostic codes, genomic data, etc. These breaches could lead to losses for both hospitals and patients, especially for STI patients, where privacy breaches could severely impact their social reputation and behavior. 3. **Impact of Cultural Background**: In certain cultural contexts, the leakage of STI information could lead to severe psychological harm and social exclusion. For example, in African countries, the leakage of HIV-related privacy could result in divorce, exclusion, discrimination, and unemployment. 4. **Existing Issues**: Currently, Chinese electronic medical records lack comprehensive management policies and technical safeguards. Unauthorized use, leakage, and even illegal trading of medical data and information are becoming increasingly serious. Patients' awareness of self-protection is increasing, leading to a significant rise in medical disputes due to insufficient protection of sensitive information. ### Research Methods 1. **Data Source**: The study used the China Renal Data System (CRDS) database, which contains data from 19 tertiary hospitals, covering five geographical regions of China. 2. **Keyword Extraction**: Using natural language processing (NLP) technology, the word2vec method was employed to collect disease-related information from internet diagnosis and treatment platforms and online medical Q&A platforms, generating keywords. 3. **Regular Expression Search**: Based on the generated keyword dictionary, a protocol for extracting STI information (EPSTII) was developed, and regular expressions were used to search for STI information in various sub-datasets of the electronic medical records. 4. **Manual Verification and Validation**: 1,000 patients with STI information and 1,000 patients without STI information were randomly selected for manual verification to calculate precision and recall rates. 5. **Privacy Protection Strategy**: Based on the identified information, a privacy protection strategy was developed, which de-identified sensitive STI information by replacing keywords and the 10 characters before and after them, thus protecting patient privacy. ### Main Results 1. **Performance of EPSTII**: EPSTII achieved 100% accuracy and recall rate in identifying STI information. 2. **Effectiveness of Privacy Protection**: The success rate of privacy protection measures reached 99.5%, ensuring maximum protection of STI patients' privacy. 3. **Distribution of STI Information**: Among 3,233,174 patients in 19 hospitals, 148,856 patients were identified with STI information, accounting for 4.8% of the total diagnostic records. ### Conclusion The EPSTII method demonstrated feasibility and effectiveness in protecting patient privacy in electronic medical records, providing comprehensive insights for infectious disease research using Chinese electronic medical records. This study also lays the foundation for the development of future Chinese patient privacy protection laws and the creation of more sophisticated tools.