Abstract:The objective of this study is to develop and evaluate natural language processing (NLP) and machine learning models to predict infant feeding status from clinical notes in the Epic electronic health records system. The primary outcome was the classification of infant feeding status from clinical notes using Medical Subject Headings (MeSH) terms. Annotation of notes was completed using TeamTat to uniquely classify clinical notes according to infant feeding status. We trained 6 machine learning models to classify infant feeding status: logistic regression, random forest, XGBoost gradient descent, k-nearest neighbors, and support-vector classifier. Model comparison was evaluated based on overall accuracy, precision, recall, and F1 score. Our modeling corpus included an even number of clinical notes that was a balanced sample across each class. We manually reviewed 999 notes that represented 746 mother-infant dyads with a mean gestational age of 38.9 weeks and a mean maternal age of 26.6 years. The most frequent feeding status classification present for this study was exclusive breastfeeding [n = 183 (18.3%)], followed by exclusive formula bottle feeding [n = 146 (14.6%)], and exclusive feeding of expressed mother's milk [n = 102 (10.2%)], with mixed feeding being the least frequent [n = 23 (2.3%)]. Our final analysis evaluated the classification of clinical notes as breast, formula/bottle, and missing. The machine learning models were trained on these three classes after performing balancing and down sampling. The XGBoost model outperformed all others by achieving an accuracy of 90.1%, a macro-averaged precision of 90.3%, a macro-averaged recall of 90.1%, and a macro-averaged F1 score of 90.1%. Our results demonstrate that natural language processing can be applied to clinical notes stored in the electronic health records to classify infant feeding status. Early identification of breastfeeding status using NLP on unstructured electronic health records data can be used to inform precision public health interventions focused on improving lactation support for postpartum patients.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to predict the infant feeding status from clinical notes in the electronic health record system through natural language processing (NLP) and machine - learning techniques. Specifically, the goal of the study is to classify the infant feeding status from clinical notes to support lactation support interventions for postpartum patients. The paper details how to use NLP and machine - learning models to automatically identify and classify the infant's feeding methods, including exclusive breastfeeding, exclusive formula bottle - feeding, and other feeding methods. Through this method, the feeding patterns of infants in the hospital can be identified early, which is of great significance for increasing the exclusive breastfeeding rate. ### Research Background - **Importance of Breast Milk**: Human milk is considered the best source of nutrition for infant health and development, and it can promote neurocognitive development and protect infants from diseases such as infections, gastroenteritis, respiratory infections, obesity, diabetes, childhood leukemia, and sudden infant death syndrome. - **Recommendations of the World Health Organization**: The WHO recommends that infants should be exclusively breastfed in the first six months after birth and continue to be breastfed until the age of two or older. - **Current Challenges**: Although most infants start breastfeeding at birth, the proportion of those who can continue exclusive breastfeeding for six months is low. In addition, formula - feeding during hospitalization is associated with a shorter duration of exclusive breastfeeding, so it is crucial to support lactation in the early postpartum period. ### Research Methods - **Data Sources**: The study used the electronic health records of the University of Florida Health System, including the clinical notes of mothers and infants. - **Annotation Tools**: The TeamTat tool was used to annotate clinical notes and classify them according to the infant feeding status. - **Machine - Learning Models**: Six machine - learning models were trained, including logistic regression, random forest, XGBoost gradient descent, k - nearest neighbor, and support vector classifier, to classify the infant feeding status. - **Performance Evaluation**: The models were compared based on overall accuracy, precision, recall, and F1 - score. ### Main Results - **Model Performance**: The XGBoost model performed the best, achieving an accuracy of 90.1%, a macro - average precision of 90.3%, a macro - average recall of 90.1%, and a macro - average F1 - score of 90.1%. - **Classification Results**: The most common feeding status classification was exclusive breastfeeding (18.3%), followed by exclusive formula bottle - feeding (14.6%), expressed breastfeeding (10.2%), and mixed feeding was the least (2.3%). ### Discussion - **Main Contributions**: - Developed an NLP - based method that can extract infant feeding status from unstructured electronic health record data for enhancing population - level breastfeeding estimates. - Provided multi - level tools for extracting social and behavioral determinants that affect the health of infants and mothers. - **Advantages**: - Achieved high accuracy using conventional machine - learning algorithms, which is feasible and interpretable. - Can quickly and regularly characterize the infant feeding trends in the hospital without waiting for the annual survey results. - **Limitations**: - The data is from a single medical system, and the terminology in other institutions may be different. - The category of "bottle - feeding" is ambiguous and may include breast milk or formula milk, and a unified definition is required. ### Future Directions - **Consensus Definition**: Develop a consensus definition of early infant feeding to make hospital data consistent with the data collected at the national level. - **Continuous Improvement**: Cooperate with EHR companies to ensure the accuracy of input information, thereby maximizing the use of NLP for meaningful clinical data analysis. Through these methods, the study demonstrated the technical feasibility and high accuracy of NLP in classifying infant feeding status in clinical notes, providing new tools and methods for increasing the breastfeeding rate.

Classifying early infant feeding status from clinical notes using natural language processing and machine learning

Development and validation of a machine learning algorithm for predicting the risk of postpartum depression among pregnant women

Comparison of Natural Language Processing of Clinical Notes With a Validated Risk-Stratification Tool to Predict Severe Maternal Morbidity

Using Natural Language Processing to Identify Stigmatizing Language in Labor and Birth Clinical Notes

Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system

Natural Language Processing for Cohort Discovery in a Discharge Prediction Model for the Neonatal ICU

Characterisation and validation of lactation information from structured electronic health records for use in pharmacoepidemiological studies

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing

Natural language processing of clinical notes enables early inborn error of immunity risk ascertainment

Use of Natural Language Processing to Identify Sexual and Reproductive Health Information in Clinical Text

Classification of neurologic outcomes from medical notes using natural language processing

Automated Identification of Patients' Unmet Social Needs in Clinical Text Using Natural Language Processing

Extraction of Geriatric Syndromes From Electronic Health Record Clinical Notes: Assessment of Statistical Natural Language Processing Methods

Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach

Detecting of a Patient's Condition From Clinical Narratives Using Natural Language Representation

FasTag: Automatic text classification of unstructured medical narratives

A Combined Classification Model for Chinese Clinical Notes

Augmenting Telepostpartum Care With Vision-Based Detection of Breastfeeding-Related Conditions: Algorithm Development and Validation

Machine Learning-Based Automatic Classification of Video Recorded Neonatal Manipulations and Associated Physiological Parameters: A Feasibility Study

Predictive Model for Extended-Spectrum β-Lactamase-Producing Bacterial Infections Using Natural Language Processing Technique and Open Data in Intensive Care Unit Environment: Retrospective Observational Study

Early detection of pediatric health risks using maternal and child health data