PMC-Patients: A Large-scale Dataset of Patient Notes and Relations Extracted from Case Reports in PubMed Central

Zhengyun Zhao,Qiao Jin,Sheng Yu
2022-01-01
Abstract:We present PMC-Patients, a dataset consisting of 167 k patient notes with 3 . 1 M relevant article annotations and 293 k similar patient annotations. The patient notes are extracted by identifying certain sections from case reports in PubMed Central, and those with at least CC BY-NC-SA license are re-distributed. Patient-article relevance and patient-patient similarity are defined by citation relationships in PubMed. We also perform four tasks with PMC-Patients to demonstrate its utility, including Patient Note Recognition, Patient-Patient Similarity, Patient-Patient Retrieval, and Patient-Article Retrieval. In summary, PMC-Patients provides the largest-scale patient notes with high quality, diverse conditions, easy access, and rich annotations 1 .
What problem does this paper attempt to address?