Evaluating the performance and potential bias of predictive models for the detection of transthyretin cardiac amyloidosis

Jonathan Hourmozdi,Nicholas Easton,Simon Benigeri,James D Thomas,Akhil Narang,David Ouyang,Grant Duffy,Ike Okwuosa,Adrienne Kline,Abel N Kho,Yuan Luo,Sanjiv J Shah,Faraz S Ahmad
DOI: https://doi.org/10.1101/2024.10.09.24315202
2024-10-10
Abstract:Background: Delays in the diagnosis of transthyretin amyloid cardiomyopathy (ATTR-CM) contribute to the significant morbidity of the condition, especially in the era of novel disease-modifying therapies. Screening for ATTR-CM with AI and other algorithms may improve timely diagnosis, but these algorithms have not been directly compared with each other. Methods: We identified patients treated at an integrated health system from 2010-2022 with biopsy or PYP scan-confirmed ATTR-CM and age -and sex-matched them to controls with HF in a 19:1 ratio to target 5% prevalence. We compared the performance of three publicly available algorithms: a random forest model of claims data, the regression-based Mayo ATTR-CM Score, and a deep-learning echo model (EchoNet-LVH). Bias was measured in the best performing models using standard fairness metrics. Results: We identified 198 confirmed cases of ATTR-CM from 3924 patients in the analytic cohort that had the necessary structured and imaging data for all three models. In this cohort, 78.9% of the cohort self-identified as White, 8.7% Black, 4.0% Hispanic and 8.4% Other. ATTR-CM prevalence was highest in individuals who identified as Black. The claims-based model performed poorly with an AUC of 0.48. EchoNet-LVH had higher AUC (0.88 vs 0.78, DeLong Test p < 0.0001) and average precision (0.61 vs 0.15) compared to the Mayo score. Bias auditing of the top two performing models demonstrated that both models satisfied our fairness criteria for equal opportunity (1.05 for EchoNet-LVH and 0.91 for ATTR-CM Score) among patients who identified as Black. Conclusions: In external validation using a large, diverse cohort of patients with heart failure, a deep-learning echo-based model to detect ATTR-CM demonstrated best overall performance compared to two other publicly available models. The results of a bias audit suggest that the regression- and echo-based models are unlikely to exacerbate existing health disparities through inequitable distribution of error with respect to self-identified Black race.
Cardiovascular Medicine
What problem does this paper attempt to address?