Development and Validation of MicrobEx: an Open-Source Package for Microbiology Culture Concept Extraction

Garrett Eickelberg,Yuan Luo,L. Nelson Sanchez-Pinto
DOI: https://doi.org/10.48550/arXiv.2111.11518
2021-11-23
Abstract:Microbiology culture reports contain critical information for important clinical and public health applications. However, microbiology reports often have complex, semi-structured, free-text data that present a barrier for secondary use. Here we present the development and validation of an open-source package designed to ingest free-text microbiology reports, determine whether the culture is positive, and return a list of SNOMED-CT mapped bacteria. Our rule-based natural language processing algorithm was developed using microbiology reports from two different electronic health record systems in a large healthcare organization, and then externally validated on the reports of two other institutions with manually-extracted results as a benchmark. Our algorithm achieved F-1 scores >0.95 on all classification tasks across both validation sets. Our concept extraction Python package, MicrobEx, is designed to be reused and adapted to individual institutions as an upstream process for other clinical applications, such as machine learning studies, clinical decision support, and disease surveillance systems.
Quantitative Methods
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the difficulty in extracting key information from microbiology culture reports. Specifically, microbiology culture reports are crucial for clinical treatment decisions and public health applications, but these reports usually contain complex, semi - structured free - text data, which creates obstacles for secondary use. To overcome this challenge, the author developed and validated an open - source software package named MicrobEx, which is designed to process microbiology reports in free - text form, determine whether the culture is positive, and return a list of bacteria mapped to SNOMED - CT. Through this method, the paper aims to provide a tool to promote the secondary use of microbiology reports, especially in machine - learning research, clinical decision - support systems, and disease - monitoring systems.