CT-Less Whole-Body Bone Segmentation of PET Images Using a Multimodal Deep Learning Network
Nan Bao,Jiaxin Zhang,Zhikun Li,Shiyu Wei,Jiazhen Zhang,Stephen E. Greenwald,John A. Onofrey,Yihuan Lu,Lisheng Xu
DOI: https://doi.org/10.1109/jbhi.2024.3501386
IF: 7.7
2024-01-01
IEEE Journal of Biomedical and Health Informatics
Abstract:In bone cancer imaging, positron emission tomography (PET) is ideal for the diagnosis and staging of bone cancers due to its high sensitivity to malignant tumors. The diagnosis of bone cancer requires tumor analysis and localization, where accurate and automated wholebody bone segmentation (WBBS) is often needed. Current WBBS for PET imaging is based on paired Computed Tomography (CT) images. However, mismatches between CT and PET images often occur due to patient motion, which leads to erroneous bone segmentation and thus, to inaccurate tumor analysis. Furthermore, there are some instances where CT images are unavailable for WBBS. In this work, we propose a novel multimodal fusion network (MMF-Net) for WBBS of PET images, without the need for CT images. Specifically, the tracer activity ( $\lambda$ -MLAA), attenuation map ( $\mu$ -MLAA), and synthetic attenuation map ( $\mu$ -DL) images are introduced into the training data. We first design a multi-encoder structure employed to fully learn modalityspecific encoding representations of the three PET modality images through independent encoding branches. Then, we propose a multimodal fusion module In this work, we propose a novel multimodal fusion network (MMF-Net) for WBBS of PET images, without the need for CT images. Specifically, the tracer activity ( $\lambda$ -MLAA), attenuation map ( $\mu$ -MLAA), and synthetic attenuation map ( $\mu$ -DL) images are introduced into the training data. We first design a multiencoder structure employed to fully learn modality-specific encoding representations of the three PET modality images through independent encoding branches. Then, we propose a multimodal fusion module in the decoder to further integrate the complementary information across the three modalities. Additionally, we introduce revised convolution units, SE (Squeeze-and-Excitation) Normalization and deep supervision to improve segmentation performance. Extensive comparisons and ablation experiments, using 130 whole-body PET image datasets, show promising results, with Dice similarity coefficient (DSC) values of 78.88%, Recall, 79.95% and Precision, 77.98%. We conclude that the proposed method can achieve WBBS with moderate to high accuracy using PET information only, which potentially can be used to overcome the current limitations of CT-based approaches, while minimizing exposure to ionizing radiation.