Abstract:Background and objective: [18f]-fluorodeoxyglucose (fdg) positron emission tomography – computed tomography (pet-ct) is now the preferred imaging modality for staging many cancers. Pet images characterize tumoral glucose metabolism while ct depicts the complementary anatomical localization of the tumor. Automatic tumor segmentation is an important step in image analysis in computer aided diagnosis systems. Recently, fully convolutional networks (fcns), with their ability to leverage annotated datasets and extract image feature representations, have become the state-of-the-art in tumor segmentation. There are limited fcn based methods that support multi-modality images and current methods have primarily focused on the fusion of multi-modality image features at various stages, i.e., early-fusion where the multi-modality image features are fused prior to fcn, late-fusion with the resultant features fused and hyper-fusion where multi-modality image features are fused across multiple image feature scales. Early- and late-fusion methods, however, have inherent, limited freedom to fuse complementary multi-modality image features. The hyper-fusion methods learn different image features across different image feature scales that can result in inaccurate segmentations, in particular, in situations where the tumors have heterogeneous textures. Methods: we propose a recurrent fusion network (rfn), which consists of multiple recurrent fusion phases to progressively fuse the complementary multi-modality image features with intermediary segmentation results derived at individual recurrent fusion phases: (1) the recurrent fusion phases iteratively learn the image features and then refine the subsequent segmentation results; and, (2) the intermediary segmentation results allows our method to focus on learning the multi-modality image features around these intermediary segmentation results, which minimize the risk of inconsistent feature learning. Results: we evaluated our method on two pathologically proven non-small cell lung cancer pet-ct datasets. We compared our method to the commonly used fusion methods (early-fusion, late-fusion and hyper-fusion) and the state-of-the-art pet-ct tumor segmentation methods on various network backbones (resnet, densenet and 3d-unet). Our results show that the rfn provides more accurate segmentation compared to the existing methods and is generalizable to different datasets. Conclusions: we show that learning through multiple recurrent fusion phases allows the iterative re-use of multi-modality image features that refines tumor segmentation results. We also identify that our rfn produces consistent segmentation results across different network architectures.

Co-Learning Multi-Modality PET-CT Features Via a Cascaded CNN-Transformer Network

CasCRNN-GL-Net: Cascaded Convolutional and Recurrent Neural Networks with Global and Local Pathways for Classification of Focal Liver Lesions in Multi-Phase CT Images

Hyper-Connected Transformer Network for Multi-Modality PET-CT Segmentation

Co-Learning Feature Fusion Maps from PET-CT Images of Lung Cancer

Tumor co-segmentation in PET/CT using multi-modality fully convolutional neural network

Recurrent Feature Fusion Learning for Multi-Modality Pet-Ct Tumor Segmentation

Hybrid CNN-transformer Network for Interactive Learning of Challenging Musculoskeletal Images.

Multi-Modal Co-Learning for Liver Lesion Segmentation on PET-CT Images

Multimodal Spatial Attention Module for Targeting Multimodal PET-CT Lung Tumor Segmentation

CTCNet: A Bi-directional Cascaded Segmentation Network Combining Transformers with CNNs for Skin Lesions.

Automated Lung Tumor Delineation on Positron Emission Tomography/computed Tomography Via a Hybrid Regional Network

Cross Modality Fusion for Modality-Specific Lung Tumor Segmentation in PET-CT Images.

Multi-modal co-learning with attention mechanism for head and neck tumor segmentation on 18FDG PET-CT

A Parallelly Contextual Convolutional Transformer for Medical Image Segmentation

MMCA-NET: A Multimodal Cross Attention Transformer Network for Nasopharyngeal Carcinoma Tumor Segmentation Based on a Total-Body PET/CT System

CAFCT-Net: A CNN-Transformer Hybrid Network with Contextual and Attentional Feature Fusion for Liver Tumor Segmentation

A Spatial Squeeze and Multimodal Feature Fusion Attention Network for Multiple Tumor Segmentation from PET–CT Volumes

HCT-Unet: multi-target medical image segmentation via a hybrid CNN-transformer Unet incorporating multi-axis gated multi-layer perceptron

Fully Convolutional Network with Sparse Feature-Maps Composition for Automatic Lung Tumor Segmentation from PET Images

CSU-Net: A CNN-Transformer Parallel Network for Multimodal Brain Tumour Segmentation

A Novel Fusion Framework Based on Adaptive PCNN in NSCT Domain for Whole-Body PET and CT Images.