Semi‐supervised learning framework with shape encoding for neonatal ventricular segmentation from 3D ultrasound

Zachary Szentimrey,Abdullah Al‐Hayali,Sandrine de Ribaupierre,Aaron Fenster,Eranga Ukwatta
DOI: https://doi.org/10.1002/mp.17242
IF: 4.506
2024-06-11
Medical Physics
Abstract:Background Three‐dimensional (3D) ultrasound (US) imaging has shown promise in non‐invasive monitoring of changes in the lateral brain ventricles of neonates suffering from intraventricular hemorrhaging. Due to the poorly defined anatomical boundaries and low signal‐to‐noise ratio, fully supervised methods for segmentation of the lateral ventricles in 3D US images require a large dataset of annotated images by trained physicians, which is tedious, time‐consuming, and expensive. Training fully supervised segmentation methods on a small dataset may lead to overfitting and hence reduce its generalizability. Semi‐supervised learning (SSL) methods for 3D US segmentation may be able to address these challenges but most existing SSL methods have been developed for magnetic resonance or computed tomography (CT) images. Purpose To develop a fast, lightweight, and accurate SSL method, specifically for 3D US images, that will use unlabeled data towards improving segmentation performance. Methods We propose an SSL framework that leverages the shape‐encoding ability of an autoencoder network to enforce complex shape and size constraints on a 3D U‐Net segmentation model. The autoencoder created pseudo‐labels, based on the 3D U‐Net predicted segmentations, that enforces shape constraints. An adversarial discriminator network then determined whether images came from the labeled or unlabeled data distributions. We used 887 3D US images, of which 87 had manually annotated labels and 800 images were unlabeled. Training/validation/testing sets of 25/12/50, 25/12/25 and 50/12/25 images were used for model experimentation. The Dice similarity coefficient (DSC), mean absolute surface distance (MAD), and absolute volumetric difference (VD) were used as metrics for comparing to other benchmarks. The baseline benchmark was the fully supervised vanilla 3D U‐Net while dual task consistency, shape‐aware semi‐supervised network, correlation‐aware mutual learning, and 3D U‐Net Ensemble models were used as state‐of‐the‐art benchmarks with DSC, MAD, and VD as comparison metrics. The Wilcoxon signed‐rank test was used to test statistical significance between algorithms for DSC and VD with the threshold being p
radiology, nuclear medicine & medical imaging
What problem does this paper attempt to address?