Abstract:Background: Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are particularly challenging to extract and analyse, because of their narrative structure. Although natural language processing (NLP) has been utilised successfully in a variety of medical fields, it is not commonly used in echocardiography analysis. Objectives: To develop an NLP-based approach for extracting and categorising data from echocardiography reports by accurately converting continuous (e.g., LVOT VTI, AV VTI and TR Vmax) and discrete (e.g., regurgitation severity) outcomes in a semi-structured narrative format into a structured and categorised format, allowing for future research or clinical use. Methods: 135,062 Trans-Thoracic Echocardiogram (TTE) reports were derived from 146967 baseline echocardiogram reports and split into three cohorts: Training and Validation (n = 1075), Test Dataset (n = 98) and Application Dataset (n = 133,889). The NLP system was developed and was iteratively refined using medical expert knowledge. The system was used to curate a moderate-fidelity database from extractions of 133,889 reports. A hold-out validation set of 98 reports was blindly annotated and extracted by two clinicians for comparison with the NLP extraction. Agreement, discrimination, accuracy and calibration of outcome measure extractions were evaluated. Results: Continuous outcomes including LVOT VTI, AV VTI and TR Vmax exhibited perfect inter-rater reliability using intra-class correlation scores (ICC = 1.00, p < 0.05) alongside high R2 values, demonstrating an ideal alignment between the NLP system and clinicians. A good level (ICC = 0.75-0.9, p < 0.05) of inter-rater reliability was observed for outcomes such as LVOT Diam, Lateral MAPSE, Peak E Velocity, Lateral E' Velocity, PV Vmax, Sinuses of Valsalva and Ascending Aorta diameters. Furthermore, the accuracy rate for discrete outcome measures was 91.38% in the confusion matrix analysis, indicating effective performance. Conclusions: The NLP-based technique yielded good results when it came to extracting and categorising data from echocardiography reports. The system demonstrated a high degree of agreement and concordance with clinician extractions. This study contributes to the effective use of semi-structured data by providing a useful tool for converting semi-structured text to a structured echo report that can be used for data management. Additional validation and implementation in healthcare settings can improve data availability and support research and clinical decision-making.

Automated Transformation of Unstructured Cardiovascular Diagnostic Reports into Structured Datasets Using Sequentially Deployed Large Language Models

Local large language models for privacy-preserving accelerated review of historic echocardiogram reports

Critical Care Studies Using Large Language Models Based on Electronic Healthcare Records: A Technical Note

EchoGPT: A Large Language Model for Echocardiography Report Summarization

Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports

Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database

Revolutionizing Cardiology With Words: Unveiling the Impact of Large Language Models in Medical Science Writing

Automated Cardiovascular Record Retrieval by Multimodal Learning between Electrocardiogram and Clinical Report

From Text to Tables: A Local Privacy Preserving Large Language Model for Structured Information Retrieval from Medical Documents

Abstract 17760: Conversion of Dicom ECG Images to Tabular Format for Building Large Language Model in Diagnoses and Disease Progression of Cardiovascular Conditions

Large language models for structured reporting in radiology: past, present, and future

Development and Validation of a Dynamic-Template-Constrained Large Language Model for Generating Fully-Structured Radiology Reports

Scalable information extraction from free text electronic health records using large language models

Privacy-preserving large language models for structured medical information retrieval

Large language models can effectively extract stroke and reperfusion audit data from medical free-text discharge summaries

Transfer Knowledge from Natural Language to Electrocardiography: Can We Detect Cardiovascular Disease Through Language Models?

Evaluating local open-source large language models for data extraction from unstructured reports on mechanical thrombectomy in patients with ischemic stroke

Large language models for extracting histopathologic diagnoses from electronic health records

ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis

Redefining Health Care Data Interoperability: Empirical Exploration of Large Language Models in Information Exchange

Enhancing Clinical Data Extraction from Pathology Reports: A Comparative Analysis of Large Language Models