Abstract:OBJECTIVE: Social determinants of health (SDOH) are non-medical factors that can profoundly impact patient health outcomes. However, SDOH are rarely available in structured electronic health record (EHR) data such as diagnosis codes, and more commonly found in unstructured narrative clinical notes. Hence, identifying social context from unstructured EHR data has become increasingly important. Yet, previous work on using natural language processing to automate extraction of SDOH from text (a) usually focuses on an ad hoc selection of SDOH, and (b) does not use the latest advances in deep learning. Our objective was to advance automatic extraction of SDOH from clinical text by (a) systematically creating a set of SDOH based on standard biomedical and psychiatric ontologies, and (b) training state-of-the-art deep neural networks to extract mentions of these SDOH from clinical notes.DESIGN: A retrospective cohort study.SETTING AND PARTICIPANTS: Data were extracted from the Medical Information Mart for Intensive Care (MIMIC-III) database. The corpus comprised 3,504 social related sentences from 2,670 clinical notes.METHODS: We developed a framework for automated classification of multiple SDOH categories. Our dataset comprised narrative clinical notes under the "Social Work" category in the MIMIC-III Clinical Database. Using standard terminologies, SNOMED-CT and DSM-IV, we systematically curated a set of 13 SDOH categories and created annotation guidelines for these. After manually annotating the 3,504 sentences, we developed and tested three deep neural network (DNN) architectures - convolutional neural network (CNN), long short-term memory (LSTM) network, and the Bidirectional Encoder Representations from Transformers (BERT) - for automated detection of eight SDOH categories. We also compared these DNNs to three baselines models: (1) cTAKES, as well as (2) L2-regularized logistic regression and (3) random forests on bags-of-words. Model evaluation metrics included micro- and macro- F1, and area under the receiver operating characteristic curve (AUC).RESULTS: All three DNN models accurately classified all SDOH categories (minimum micro-F1 = 0.632, minimum macro-AUC = 0.854). Compared to the CNN and LSTM, BERT performed best in most key metrics (micro-F1 = 0.690, macro-AUC = 0.907). The BERT model most effectively identified the "occupational" category (F1 = 0.774, AUC = 0.965) and least effectively identified the "non-SDOH" category (F = 0.491, AUC = 0.788). BERT outperformed cTAKES in distinguishing social vs non-social sentences (BERT F1 = 0.87 vs. cTAKES F1 = 0.06), and outperformed logistic regression (micro-F1 = 0.649, macro-AUC = 0.696) and random forest (micro-F1 = 0.502, macro-AUC = 0.523) trained on bag-of-words.CONCLUSIONS: Our study framework with DNN models demonstrated improved performance for efficiently identifying a systematic range of SDOH categories from clinical notes in the EHR. Improved identification of patient SDOH may further improve healthcare outcomes.

1307-P: Natural Language Processing of Clinical Notes to Find Diabetes Type and Onset Year in Children and Young Adults

Galanin inhibits proinsulin gene expression stimulated by the insulinotropic hormone glucagon-like peptide-I(7-37) in mouse insulinoma beta TC-1 cells.

Experimental production of congenitally prolonged Q-T interval in neonatal mice.

Bioconversion of crude glycerol feedstocks into ethanol by Pachysolen tannophilus.

Identifying Diabetes Related-Complications in a Real-World Free-Text Electronic Medical Records in Hebrew Using Natural Language Processing Techniques

Developing an automated algorithm for identification of children and adolescents with diabetes using electronic health records from the OneFlorida+ clinical research network

Natural language processing of clinical notes enables early inborn error of immunity risk ascertainment

Identification of pancreatic cancer risk factors from clinical notes using natural language processing

Translating Subphenotypes of Newly Diagnosed Type 2 Diabetes from Cohort Studies to Electronic Health Records in the United States

[The role of lamellar (myelin) bodies in the metabolism of cells and tissues].

NATURAL LANGUAGE PROCESSING IMPROVES PHENOTYPIC ACCURACY IN AN ELECTRONIC MEDICAL RECORD COHORT OF TYPE 2 DIABETES AND CARDIOVASCULAR DISEASE

Identification of atypical pediatric diabetes mellitus cases using electronic medical records

1458-P: Challenges for the Diagnostic Classification of Type 1 Diabetes (T1D) among Older Adults in Electronic Health Record (EHR) Data

Detection of Diabetes Status and Type in Youth Using Electronic Health Records: The SEARCH for Diabetes in Youth Study

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing

Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study

Using natural language processing to identify opioid use disorder in electronic health record data

Patient‐reported outcomes and treatment adherence in type 2 diabetes using natural language processing: Wave 8 of the Observational International Diabetes Management Practices Study

Structure and stability of transposon 5-mediated cointegrates.

Watch Where and How You Stick Pins When Playing With Voodoo Correlations

A machine learning tool for identifying patients with newly diagnosed diabetes in primary care