A Harmonised Approach to Curating Research-Ready Datasets for Asthma, Chronic Obstructive Pulmonary Disease (COPD) and Interstitial Lung Disease (ILD) in England, Wales and Scotland Using Clinical Practice Research Datalink (CPRD), Secure Anonymised Information Linkage (SAIL) Databank and DataLoch
Sara Hatam,Sean Scully,Sarah Cook,Hywel Evans,Alastair Hume,Constantinos Kallis,Ian Farr,Chris Orton,Aziz Sheikh,Jennifer Quint
DOI: https://doi.org/10.2147/clep.s437937
2024-04-05
Clinical Epidemiology
Abstract:Sara Hatam, 1, &ast Sean Timothy Scully, 2, &ast Sarah Cook, 3, &ast Hywel T Evans, 2, &ast Alastair Hume, 4 Constantinos Kallis, 3 Ian Farr, 2 Chris Orton, 2 Aziz Sheikh, 1 Jennifer K Quint 3 1 Usher Institute, The University of Edinburgh, Edinburgh, UK; 2 Population Data Science, Swansea University Medical School, Swansea, UK; 3 School of Public Health, Imperial College London, London, UK; 4 EPCC, The University of Edinburgh, Edinburgh, UK &astThese authors contributed equally to this work Correspondence: Jennifer K Quint, Email Background: Electronic healthcare records (EHRs) are an important resource for health research that can be used to improve patient outcomes in chronic respiratory diseases. However, consistent approaches in the analysis of these datasets are needed for coherent messaging, and when undertaking comparative studies across different populations. Methods and Results: We developed a harmonised curation approach to generate comparable patient cohorts for asthma, chronic obstructive pulmonary disease (COPD) and interstitial lung disease (ILD) using datasets from within Clinical Practice Research Datalink (CPRD; for England), Secure Anonymised Information Linkage (SAIL; for Wales) and DataLoch (for Scotland) by defining commonly derived variables consistently between the datasets. By working in parallel on the curation methodology used for CPRD, SAIL and DataLoch for asthma, COPD and ILD, we were able to highlight key differences in coding and recording between the databases and identify solutions to enable valid comparisons. Conclusion: Codelists and metadata generated have been made available to help re-create the asthma, COPD and ILD cohorts in CPRD, SAIL and DataLoch for different time periods, and provide a starting point for the curation of respiratory datasets in other EHR databases, expediting further comparable respiratory research. Keywords: COPD, asthma, ILD, HER, harmonisation, data curation Asthma, chronic obstructive pulmonary disease (COPD) and interstitial lung disease (ILD) are chronic respiratory diseases associated with substantial disability and mortality worldwide. 1 Asthma, a chronic inflammatory respiratory disease, associated with airway inflammation and hyper-responsiveness and is characterised by cough, wheeze and chest tightness. It is common across Europe, with ~30 million diagnosed cases among children and adults aged <45 years. In the United Kingdom (UK), over 5.4 million people have asthma, 2 accounting for over 65,000 hospital admissions and 1000 deaths annually. 3 In 2016/17, more than 75,000 people spanning all age groups experienced an asthma exacerbation that required hospitalisation. COPD is a chronic condition characterised by progressive airflow obstruction, which is not completely reversible. 4,5 In 2020/21, the prevalence of COPD in England was estimated at 1.9%, which equated to approximately 1.17 million people. 6 COPD contributes to nearly 30,000 deaths each year in the UK, corresponding to 5.7% of adult male and 4.0% of adult female deaths, including a substantial number of premature deaths. 7 ILD encompasses a heterogeneous group of disorders, ranging from conditions that completely resolve without requirement for pharmacological intervention through to fibrotic lung diseases, which inexorably progress to respiratory failure and death despite treatment. ILD is thus an umbrella term used to represent a diverse group of lung conditions with different aetiologies, unpredictable progression and varying survival times. There is a large variation in global prevalence and burden, due in part to varying ontologies and diagnostic accuracy. 8 Burden is greatest in those fibrotic ILDs, and in particular, those that are progressive. 9 The most common ILD is idiopathic pulmonary fibrosis (IPF), which has attracted the most research interest in recent years. 8 This is because not only because it is the most prevalent of the ILDs, but also because it has a universally progressive nature and a poor prognosis. Additionally, the UK has one of the highest incidence of IPF, making research into IPF of particular interest to UK health organisations. 10 As the digitisation of health systems rapidly matures, there is an accompanying proliferation of research that is now capitalising on this digital ecosystem. EHRs are an increasingly important resource to help improve patient outcomes in chronic respiratory disease. 11 However, without appropriate data cleaning and curation, the potent -Abstract Truncated-
public, environmental & occupational health