Deep Learning Framework with Multi-Head Dilated Encoders for Enhanced Segmentation of Cervical Cancer on Multiparametric Magnetic Resonance Imaging

Reza Kalantar,Sebastian Curcean,Jessica M Winfield,Gigin Lin,Christina Messiou,Matthew D Blackledge,Dow-Mu Koh
2023-06-20
Abstract:T2-weighted magnetic resonance imaging (MRI) and diffusion-weighted imaging (DWI) are essential components for cervical cancer diagnosis. However, combining these channels for training deep learning models are challenging due to misalignment of images. Here, we propose a novel multi-head framework that uses dilated convolutions and shared residual connections for separate encoding of multiparametric MRI images. We employ a residual U-Net model as a baseline, and perform a series of architectural experiments to evaluate the tumor segmentation performance based on multiparametric input channels and feature encoding configurations. All experiments were performed using a cohort including 207 patients with locally advanced cervical cancer. Our proposed multi-head model using separate dilated encoding for T2W MRI, and combined b1000 DWI and apparent diffusion coefficient (ADC) images achieved the best median Dice coefficient similarity (DSC) score, 0.823 (95% confidence interval (CI), 0.595-0.797), outperforming the conventional multi-channel model, DSC 0.788 (95% CI, 0.568-0.776), although the difference was not statistically significant (p>0.05). We investigated channel sensitivity using 3D GRAD-CAM and channel dropout, and highlighted the critical importance of T2W and ADC channels for accurate tumor segmentations. However, our results showed that b1000 DWI had a minor impact on overall segmentation performance. We demonstrated that the use of separate dilated feature extractors and independent contextual learning improved the model's ability to reduce the boundary effects and distortion of DWI, leading to improved segmentation performance. Our findings can have significant implications for the development of robust and generalizable models that can extend to other multi-modal segmentation applications.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?