Fully automated volumetric measurement of malignant pleural mesothelioma by deep learning AI: validation and comparison with modified RECIST response criteria

Andrew C Kidd,Owen Anderson,Gordon W Cowell,Alexander J Weir,Jeremy P Voisey,Matthew Evison,Selina Tsim,Keith A Goatman,Kevin G Blyth
DOI: https://doi.org/10.1136/thoraxjnl-2021-217808
2022-02-02
Thorax
Abstract:Background In malignant pleural mesothelioma (MPM), complex tumour morphology results in inconsistent radiological response assessment. Promising volumetric methods require automation to be practical. We developed a fully automated Convolutional Neural Network (CNN) for this purpose, performed blinded validation and compared CNN and human response classification and survival prediction in patients treated with chemotherapy. Methods In a multicentre retrospective cohort study; 183 CT datasets were split into training and internal validation (123 datasets (80 fully annotated); 108 patients; 1 centre) and external validation (60 datasets (all fully annotated); 30 patients; 3 centres). Detailed manual annotations were used to train the CNN, which used two-dimensional U-Net architecture. CNN performance was evaluated using correlation, Bland-Altman and Dice agreement. Volumetric response/progression were defined as ≤30%/≥20% change and compared with modified Response Evaluation Criteria In Solid Tumours (mRECIST) by Cohen’s kappa. Survival was assessed using Kaplan-Meier methodology. Results Human and artificial intelligence (AI) volumes were strongly correlated (validation set r=0.851, p<0.0001). Agreement was strong (validation set mean bias +31 cm 3 (p=0.182), 95% limits 345 to +407 cm 3 ). Infrequent AI segmentation errors (4/60 validation cases) were associated with fissural tumour, contralateral pleural thickening and adjacent atelectasis. Human and AI volumetric responses agreed in 20/30 (67%) validation cases κ=0.439 (0.178 to 0.700). AI and mRECIST agreed in 16/30 (55%) validation cases κ=0.284 (0.026 to 0.543). Higher baseline tumour volume was associated with shorter survival. Conclusion We have developed and validated the first fully automated CNN for volumetric MPM segmentation. CNN performance may be further improved by enriching future training sets with morphologically challenging features. Volumetric response thresholds require further calibration in future studies.
respiratory system
What problem does this paper attempt to address?