HCLmNet: A Unified Hybrid Continual Learning Strategy Multimodal Network for Lung Cancer Survival Prediction

MD ILIAS BAPPI,David J Richter,Shivani Sanjay Kolekar,Kyungbaek Kim
DOI: https://doi.org/10.1101/2024.12.14.24319041
2024-12-16
Abstract:Lung cancer survival prediction is a critical task in healthcare, where accurate and timely predictions can significantly impact patient outcomes. In hospital settings, new patient data is constantly generated, requiring predictive models to adapt without forgetting previously learned knowledge. This challenge is intensified by the need to seamlessly integrate complex multimodal data, such as imaging, DNA, and patient records. Traditional Deep Learning (DL) models, while powerful, often suffer from catastrophic forgetting during incremental learning, further complicating the task of reliable survival prediction in dynamic environments. To address these challenges, we introduce a hybrid Continual Learning (CL) framework that integrates Elastic Weight Consolidation (EWC) with replay-based modules, including EWC Experience Replay (ER), Instance-Level Correlation Replay (EICR), and Class-Level Correlation Replay (ECCR). The ER module preserves knowledge by replaying representative samples from previous data, mitigating interference from new data. The EICR module ensures the retention of fine-grained feature patterns through inter-instance relationship modeling, while the ECCR module consolidates global knowledge across tasks using random triplet probabilities to preserve inter-class correlations. Together, these components create a robust framework, addressing catastrophic forgetting while enhancing adaptability for real-time survival prediction. Another critical challenge is the limitations of Convolutional Neural Networks (CNNs), which tend to miss ground-glass opacities or tiny tumor features in CT and PET images due to their reliance on datasets similar to their pretraining data. To overcome this, we propose a Swin Transformer (SwinT)-based method to extract critical features, addressing CNN shortcomings in such multimodal scenarios. Additionally, XLNet-permutation enriches multimodal analysis by effectively handling small DNA datasets and capturing latent patterns, whereas Fully Connected Network (FCN) process clinical features. A cross-attention fusion mechanism integrates clinical, CT, PET, and DNA data, producing a robust survival prediction model. The final prediction is guided by FCN and Cox Proportional Hazards (CoxPH) techniques, achieves state-of-the-art performance with a 7.7% concordance index (C-Index) improvement (0.84), a mean absolute error (MAE) reduction to 140 days, and minimized forgetting to 0.08. Ablation studies demonstrate the importance of the DNA modality, cross-attention mechanism, and CL strategies, advancing adaptive survival prediction and stability.
What problem does this paper attempt to address?