Abstract:Background: Carcinoma of unknown primary (CUP) is a subset of metastatic cancers in which the primary tissue source of the cancer cells remains unidentified. CUP is the eighth most common malignancy worldwide, accounting for up to 5% of all malignancies. Representing an exceptionally aggressive metastatic cancer, the median survival is approximately 3 to 6 months. The tissue in which cancer arises plays a key role in our understanding of sensitivities to various forms of cell death. Thus, the lack of knowledge on the tissue of origin (TOO) makes it difficult to devise tailored and effective treatments for patients with CUP. Developing quick and clinically implementable methods to identify the TOO of the primary site is crucial in treating patients with CUP. Noncoding RNAs may hold potential for origin identification and provide a robust route to clinical implementation due to their resistance against chemical degradation. Objective: This study aims to investigate the potential of microRNAs, a subset of noncoding RNAs, as highly accurate biomarkers for detecting the TOO through data-driven, machine learning approaches for metastatic cancers. Methods: We used microRNA expression data from The Cancer Genome Atlas data set and assessed various machine learning approaches, from simple classifiers to deep learning approaches. As a test of our classifiers, we evaluated the accuracy on a separate set of 194 primary tumor samples from the Sequence Read Archive. We used permutation feature importance to determine the potential microRNA biomarkers and assessed them with principal component analysis and t-distributed stochastic neighbor embedding visualizations. Results: Our results show that it is possible to design robust classifiers to detect the TOO for metastatic samples on The Cancer Genome Atlas data set, with an accuracy of up to 97% (351/362), which may be used in situations of CUP. Our findings show that deep learning techniques enhance prediction accuracy. We progressed from an initial accuracy prediction of 62.5% (226/362) with decision trees to 93.2% (337/362) with logistic regression, finally achieving 97% (351/362) accuracy using deep learning on metastatic samples. On the Sequence Read Archive validation set, a lower accuracy of 41.2% (77/188) was achieved by the decision tree, while deep learning achieved a higher accuracy of 80.4% (151/188). Notably, our feature importance analysis showed the top 3 most important features for predicting TOO to be microRNA-10b, microRNA-205, and microRNA-196b, which aligns with previous work. Conclusions: Our findings highlight the potential of using machine learning techniques to devise accurate tests for detecting TOO for CUP. Since microRNAs are carried throughout the body via extracellular vesicles secreted from cells, they may serve as key biomarkers for liquid biopsy due to their presence in blood plasma. Our work serves as a foundation toward developing blood-based cancer detection tests based on the presence of microRNA.

Explainable Machine Learning Models Using Robust Cancer Biomarkers Identification from Paired Differential Gene Expression

Identification of Gene Expression in Different Stages of Breast Cancer with Machine Learning

Identification of Differentially Expressed Genes Between Original Breast Cancer and Xenograft Using Machine Learning Algorithms

Tumor origin identification through machine learning and gene expression profiling.

Using Machine Learning Methods to Study Colorectal Cancer Tumor Micro-Environment and Its Biomarkers.

Cancer prediction with gene expression profiling and differential evolution

Bimodal gene expression in cancer patients provides interpretable biomarkers for drug sensitivity

Feature Selection and Assessment of Lung Cancer Sub-types by Applying Predictive Models

A Comparative Analysis of Gene Expression Profiling by Statistical and Machine Learning Approaches

Application of Transcriptome-Based Gene Set Featurization for Machine Learning Model to Predict the Origin of Metastatic Cancer

Cancer Gene Profiling through Unsupervised Discovery

Elucidation of dynamic microRNA regulations in cancer progression using integrative machine learning

Machine learning-informed liquid-liquid phase separation for personalized breast cancer treatment assessment

miRNA in Machine-Learning-Based Diagnostics of Oral Cancer

SVM-DO: identification of tumor-discriminating mRNA signatures via support vector machines supported by Disease Ontology

Machine learning enabled prediction of digital biomarkers from whole slide histopathology images

A Robust Personalized Classification Method for Breast Cancer Metastasis Prediction

Identifying the Signatures and Rules of Circulating Extracellular MicroRNA for Distinguishing Cancer Subtypes

Stable feature selection utilizing Graph Convolutional Neural Network and Layer-wise Relevance Propagation for biomarker discovery in breast cancer

Deep Learning-Based Identification of Tissue of Origin for Carcinomas of Unknown Primary Using MicroRNA Expression: Algorithm Development and Validation

Predicting gene signature in breast cancer patients with multiple machine learning models