Impact of inconsistent ethnicity recordings on estimates of inequality in child health and education data: a data linkage study of Child and Adolescent Mental Health Services in South London

Alice Wickersham,Jayati Das-Munshi,Tamsin Ford,Amelia Jewell,Robert Stewart,Johnny Downs,Wickersham,A.,Das-Munshi,J.,Ford,T.,Jewell,A.,Stewart,R.,Downs,J.
DOI: https://doi.org/10.1136/bmjopen-2023-078788
IF: 3.006
2024-03-07
BMJ Open
Abstract:Objectives Ethnicity data are critical for identifying inequalities, but previous studies suggest that ethnicity is not consistently recorded between different administrative datasets. With researchers increasingly leveraging cross-domain data linkages, we investigated the completeness and consistency of ethnicity data in two linked health and education datasets. Design Cohort study. Setting South London and Maudsley NHS Foundation Trust deidentified electronic health records, accessed via Clinical Record Interactive Search (CRIS) and the National Pupil Database (NPD) (2007–2013). Participants N=30 426 children and adolescents referred to local Child and Adolescent Mental Health Services. Primary and secondary outcome measures Ethnicity data were compared between CRIS and the NPD. Associations between ethnicity as recorded from each source and key educational and clinical outcomes were explored with risk ratios. Results Ethnicity data were available for 79.3% from the NPD, 87.0% from CRIS, 97.3% from either source and 69.0% from both sources. Among those who had ethnicity data from both, the two data sources agreed on 87.0% of aggregate ethnicity categorisations overall, but with high levels of disagreement in Mixed and Other ethnic groups. Strengths of associations between ethnicity, educational attainment and neurodevelopmental disorder varied according to which data source was used to code ethnicity. For example, as compared with White pupils, a significantly higher proportion of Asian pupils achieved expected educational attainment thresholds only if ethnicity was coded from the NPD (RR=1.46, 95% CI 1.29 to 1.64), not if ethnicity was coded from CRIS (RR=1.11, 0.98 to 1.26). Conclusions Data linkage has the potential to minimise missing ethnicity data, and overlap in ethnicity categorisations between CRIS and the NPD was generally high. However, choosing which data source to primarily code ethnicity from can have implications for analyses of ethnicity, mental health and educational outcomes. Users of linked data should exercise caution in combining and comparing ethnicity between different data sources.
medicine, general & internal
What problem does this paper attempt to address?
The paper attempts to address the issue of inconsistent racial data recording across different data sources and its impact on estimates of health and educational inequalities among children. Specifically, the researchers compared racial data from two datasets within South London Child and Adolescent Mental Health Services (the Clinical Record Interactive Search system CRIS and the National Pupil Database NPD) and explored the following points: 1. **Completeness of racial data**: The study analyzed the availability of racial data in the two datasets, finding that 79.3% of the data in the NPD was available, 87.0% in the CRIS, and when combining the two data sources, this proportion increased to 97.3%. 2. **Consistency of racial classification**: Although the overall match rate for racial classification between the two data sources was high (87.0%), there were significant differences in the "Mixed" and "Other" racial groups. 3. **Relationship between education and clinical outcomes**: The study found that encoding race using different data sources led to significant differences in educational attainment and neurodevelopmental disorder diagnosis outcomes. For example, compared to White students, Asian students had significantly higher rates of meeting educational standards when race was encoded using the NPD; however, this difference was not apparent when using the CRIS encoding. In summary, this paper aims to reveal how inconsistencies in racial data recording across different data sources affect research on health and educational inequalities and cautions researchers to handle racial classification issues carefully when using these data.