Deep learning-based identification of patients at increased risk of cancer using routine laboratory markers

Vivek Singh,Shikha Chaganti,Matthias Siebert,Soumya Rajesh,Andrei Puiu,Raj Gopalan,Jamie Gramz,Dorin Comaniciu,Ali Kamen
2024-10-25
Abstract:Early screening for cancer has proven to improve the survival rate and spare patients from intensive and costly treatments due to late diagnosis. Cancer screening in the healthy population involves an initial risk stratification step to determine the screening method and frequency, primarily to optimize resource allocation by targeting screening towards individuals who draw most benefit. For most screening programs, age and clinical risk factors such as family history are part of the initial risk stratification algorithm. In this paper, we focus on developing a blood marker-based risk stratification approach, which could be used to identify patients with elevated cancer risk to be encouraged for taking a diagnostic test or participate in a screening program. We demonstrate that the combination of simple, widely available blood tests, such as complete blood count and complete metabolic panel, could potentially be used to identify patients at risk for colorectal, liver, and lung cancers with areas under the ROC curve of 0.76, 0.85, 0.78, respectively. Furthermore, we hypothesize that such an approach could not only be used as pre-screening risk assessment for individuals but also as population health management tool, for example to better interrogate the cancer risk in certain sub-populations.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to use routine laboratory test indicators (such as complete blood count and comprehensive metabolic panel) to identify patients with an increased risk of cancer, in order to encourage them to undergo diagnostic tests or participate in screening programs. Specifically, the author hopes to use a deep - learning model to identify high - risk groups who may have colorectal cancer, liver cancer and lung cancer in advance based on common blood biomarkers, thereby increasing the early detection rate of these cancers, improving the survival rate of patients, and reducing the high treatment costs due to late - stage diagnosis. ### Main problem decomposition 1. **The need for early cancer screening**: - Early screening has been proven to improve the survival rate of cancer patients and avoid expensive and complicated treatments due to late - stage diagnosis. - Screening procedures usually involve an initial risk stratification step to determine the screening method and frequency, optimize resource allocation, and ensure that the screening targets the most benefited population. 2. **Limitations of existing screening methods**: - At present, most screening procedures rely on age and clinical risk factors (such as family history), but these factors cannot comprehensively cover all potential high - risk individuals. - The participation rate in cancer screening is low, partly because the general public has insufficient awareness of their own risks. 3. **Proposing a new risk assessment tool**: - The paper proposes a risk stratification method based on blood biomarkers, aiming to identify individuals with a higher risk of cancer, so as to encourage them to undergo further diagnosis or screening. - By combining simple and widely available blood tests (such as complete blood count and comprehensive metabolic panel), this method is expected to be used to identify high - risk patients with colorectal cancer, liver cancer and lung cancer. ### Method overview - **Data source**: The study used data of patients diagnosed with malignant tumors (including colorectal cancer, liver cancer and lung cancer) or without a malignant tumor diagnosis between 2017 and 2021. - **Model development**: The author developed a deep - learning model named Deep Profiler, which predicts the probability of a patient developing cancer within the next 12 months based on age, gender and routine blood biomarkers (such as various indicators in CBC and CMP). - **Performance evaluation**: The performance of the model on three cancers (colorectal cancer, liver cancer and lung cancer) was evaluated through the validation cohort, and the results showed that the area under the receiver operating characteristic curve (AUC) was 0.76, 0.85 and 0.78 respectively. ### Conclusion The paper shows that by using routine blood test indicators and deep - learning technology, individuals with a higher risk of cancer can be identified at an early stage, thus providing earlier intervention opportunities for these patients. This method is not only helpful for risk assessment at the individual level, but can also be used as a group health management tool to help better understand the cancer risks of certain subgroups.