Uncovering Medical Insights from Vast Amounts of Biomedical Data in Clinical Case Reports
Yijiang Zhou,David A. Liem,Jessica M. Lee,Quan Cao,Brian Bleakley,J. Harry Caufield,Sanjana Murali,Wei Wang,Li Zhang,Alex Bui,Yizhou Sun,Karol E. Watson,Jiawei Han,Peipei Ping
DOI: https://doi.org/10.1101/172460
2017-01-01
Abstract:Clinical case reports (CCRs) have a time-honored tradition in serving as an important means of sharing clinical experiences on patients presenting with atypical disease phenotypes or receiving new therapies. However, the huge amount of accumulated case reports are isolated, unstructured, and heterogeneous clinical data, posing a great challenge to clinicians and researchers in mining relevant information through existing indexing tools. In this investigation, in order to render CCRs more findable, accessible, interoperable, and reusable (FAIR) by the biomedical community, we created a resource platform, including the construction of a test dataset consisting of 1000 CCRs spanning 14 disease phenotypes, a standardized metadata template and metrics, and a set of computational tools to automatically retrieve relevant medical information and to analyze all published PubMed clinical case reports with respect to trends in publication journals, citations impact, MeSH Terms, drug use, distributions of patient demographics, and relationships with other case reports and databases. Our standardized metadata template and CCR test dataset may be valuable resources to advance medical science and improve patient care for researchers who are using machine learning approaches with a high-quality dataset to train and validate their algorithms. In the future, our analytical tools may be applied towards other large clinical data sources as well.