Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A review

Lingchao Mao,Hairong Wang,Leland S. Hu,Nhan L Tran,Peter D Canoll,Kristin R Swanson,Jing Li

2024-01-12

Abstract:Cancer remains one of the most challenging diseases to treat in the medical field. Machine learning has enabled in-depth analysis of rich multi-omics profiles and medical imaging for cancer diagnosis and prognosis. Despite these advancements, machine learning models face challenges stemming from limited labeled sample sizes, the intricate interplay of high-dimensionality data types, the inherent heterogeneity observed among patients and within tumors, and concerns about interpretability and consistency with existing biomedical knowledge. One approach to surmount these challenges is to integrate biomedical knowledge into data-driven models, which has proven potential to improve the accuracy, robustness, and interpretability of model results. Here, we review the state-of-the-art machine learning studies that adopted the fusion of biomedical knowledge and data, termed knowledge-informed machine learning, for cancer diagnosis and prognosis. Emphasizing the properties inherent in four primary data types including clinical, imaging, molecular, and treatment data, we highlight modeling considerations relevant to these contexts. We provide an overview of diverse forms of knowledge representation and current strategies of knowledge integration into machine learning pipelines with concrete examples. We conclude the review article by discussing future directions to advance cancer research through knowledge-informed machine learning.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The paper attempts to address the following key issues: 1. **Tumor Heterogeneity and Individual Differences**: A major challenge in cancer treatment is the high heterogeneity of tumors between different patients and within the same tumor. This heterogeneity limits the effectiveness of traditional "one-size-fits-all" treatment approaches. Therefore, there is a need to develop models that can accurately describe the spatial landscape of tumors and support personalized treatment. 2. **Data Annotation and Sample Size Limitations**: High-quality and large-scale training and testing data are crucial for the performance of machine learning models. However, in practical applications, obtaining a large number of annotated tumor samples is very difficult because each patient's biopsy sample is limited in quantity and location. This limits the ability of machine learning models to independently learn the complete spatial landscape of tumors. 3. **Integration of Multimodal, High-Dimensional Data**: Cancer diagnosis and prognosis often require the analysis of various types of data, including clinical data, imaging data, molecular data, and treatment data. These data are usually high-dimensional and relatively small in sample size. Effectively integrating these data to provide clinical predictions is a significant challenge. 4. **Model Interpretability and Consistency**: Although deep learning models perform well in many tasks, they are often considered "black box" models, with decision processes that are difficult to understand and verify. This limits their credibility and practicality as clinical decision support tools. Therefore, improving the interpretability of models and their consistency with existing biomedical knowledge is another important direction. To address the above challenges, the paper proposes a method to incorporate biomedical knowledge into machine learning models, called Knowledge-Infused Machine Learning (KIML). By utilizing domain knowledge to regularize the model's learning process, the accuracy, robustness, and interpretability of the model can be improved.

Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A review

Applied machine learning in cancer research: A systematic review for patient diagnosis, classification and prognosis

Machine Learning Meets Cancer

Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review

Machine Learning and Computer Vision Based Methods for Cancer Classification: A Systematic Review

From patterns to patients: Advances in clinical machine learning for cancer diagnosis, prognosis, and treatment

A Systematic Review of Applications of Machine Learning in Cancer Prediction and Diagnosis

Machine Learning in Metastatic Cancer Research: Potentials, Possibilities, and Prospects

Artificial intelligence (AI) and machine learning (ML) in precision oncology: a review on enhancing discoverability through multiomics integration

A review of cancer data fusion methods based on deep learning

A review of machine learning approaches, challenges and prospects for computational tumor pathology

Computer-Aided Cancer Diagnosis via Machine Learning and Deep Learning: A comparative review

Machine Learning Applications in Lung Cancer Diagnosis, Treatment and Prognosis

Machine learning applications in cancer prognosis and prediction

Emerging research trends in artificial intelligence for cancer diagnostic systems: A comprehensive review

The Application of Deep Learning in Cancer Prognosis Prediction

Advances in Machine Learning for Tumour Classification in Cancer of Unknown Primary: A Mini-Review

Cancer Diagnosis Using Deep Learning: A Bibliographic Review

INTEGRATIVE MACHINE LEARNING APPROACHES FOR MULTI-OMICS DATA ANALYSIS IN CANCER RESEARCH

Review paper on research direction towards cancer prediction and prognosis using machine learning and deep learning models