Co-Learning Feature Fusion Maps from PET-CT Images of Lung Cancer

Ashnil Kumar,Michael Fulham,Dagan Feng,Jinman Kim
DOI: https://doi.org/10.1109/TMI.2019.2923601
2019-10-28
Abstract:The analysis of multi-modality positron emission tomography and computed tomography (PET-CT) images for computer aided diagnosis applications requires combining the sensitivity of PET to detect abnormal regions with anatomical localization from CT. Current methods for PET-CT image analysis either process the modalities separately or fuse information from each modality based on knowledge about the image analysis task. These methods generally do not consider the spatially varying visual characteristics that encode different information across the different modalities, which have different priorities at different locations. For example, a high abnormal PET uptake in the lungs is more meaningful for tumor detection than physiological PET uptake in the heart. Our aim is to improve fusion of the complementary information in multi-modality PET-CT with a new supervised convolutional neural network (CNN) that learns to fuse complementary information for multi-modality medical image analysis. Our CNN first encodes modality-specific features and then uses them to derive a spatially varying fusion map that quantifies the relative importance of each modality's features across different spatial locations. These fusion maps are then multiplied with the modality-specific feature maps to obtain a representation of the complementary multi-modality information at different locations, which can then be used for image analysis. We evaluated the ability of our CNN to detect and segment multiple regions with different fusion requirements using a dataset of PET-CT images of lung cancer. We compared our method to baseline techniques for multi-modality image fusion and segmentation. Our findings show that our CNN had a significantly higher foreground detection accuracy (99.29%, p < 0.05) than the fusion baselines and a significantly higher Dice score (63.85%) than recent PET-CT tumor segmentation methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to more effectively fuse complementary information in multi - modal positron emission tomography and computed tomography (PET - CT) images to improve the accuracy of lung cancer detection and segmentation. Current methods either process the information of each modality separately and then perform information fusion based on task - related knowledge, or completely ignore the differences in visual characteristics of different modalities at different spatial locations, and these differences have different priorities for detection and segmentation tasks. For example, the high - abnormal PET uptake in the lungs is more significant for tumor detection than the physiological PET uptake in the heart. The paper proposes a new supervised convolutional neural network (CNN), aiming to improve multi - modal medical image analysis by learning to fuse complementary information from PET and CT. Specifically, this CNN first encodes modality - specific features, and then uses these features to derive a spatially - varying fusion map, which quantifies the relative importance of each modality feature at different spatial locations. These fusion maps are then multiplied by the modality - specific feature maps to obtain complementary multi - modal information representations at different locations, which can be used for image analysis. The authors evaluated the ability of the CNN to detect and segment multiple regions (such as the lung, mediastinum, and tumor) by using a lung cancer PET - CT image data set, and compared their method with existing multi - modal image fusion baseline techniques. The results show that this method is significantly superior to other methods in terms of foreground detection accuracy (99.29%, p < 0.05) and Dice score (63.85%). This indicates that by learning spatially - varying fusion strategies, the complementary information in multi - modal PET - CT images can be more effectively utilized, thereby improving the performance of computer - aided diagnosis.