A general framework for developing computable clinical phenotype algorithms

David S Carrell,James S Floyd,Susan Gruber,Brian L Hazlehurst,Patrick J Heagerty,Jennifer C Nelson,Brian D Williamson,Robert Ball
DOI: https://doi.org/10.1093/jamia/ocae121
2024-08-01
Abstract:Objective: To present a general framework providing high-level guidance to developers of computable algorithms for identifying patients with specific clinical conditions (phenotypes) through a variety of approaches, including but not limited to machine learning and natural language processing methods to incorporate rich electronic health record data. Materials and methods: Drawing on extensive prior phenotyping experiences and insights derived from 3 algorithm development projects conducted specifically for this purpose, our team with expertise in clinical medicine, statistics, informatics, pharmacoepidemiology, and healthcare data science methods conceptualized stages of development and corresponding sets of principles, strategies, and practical guidelines for improving the algorithm development process. Results: We propose 5 stages of algorithm development and corresponding principles, strategies, and guidelines: (1) assessing fitness-for-purpose, (2) creating gold standard data, (3) feature engineering, (4) model development, and (5) model evaluation. Discussion and conclusion: This framework is intended to provide practical guidance and serve as a basis for future elaboration and extension.
What problem does this paper attempt to address?