Abstract:OBJECTIVE:To screen potential pan-cancer biomarkers based on The Cancer Genome Atlas (TCGA) database, and to provide help for the diagnosis and prognosis assessment of a variety of cancers. METHODS:"GDC Data Transfer Tool" and "GDCRNATools" packages were used to obtain TCGA database. After data sorting, a total of 13 cancers were selected for further analysis. False disco-very rate (FDR) < 0.05 and fold change (FC) >1.5 were used as the differential expression criteria to screen genes and miRNAs that were up- or down-regulated in all the 13 cancers. In the receiver operating characteristic curve (ROC curve), the area under the curve (AUC), the best cut-off value and the corresponding sensitivity and specificity were used to reflect diagnostic significance. The Kaplan-Meier method was used to calculate the survival probability and then the log-rank test was performed. Hazard ratio (HR) was calculated to reflect prognostic evaluation significance. DAVID tool were used to perform GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) enrichment analysis for differentially expressed genes. STRING and TargetScan tools were used to analyze the regulatory network of differentially expressed genes and miRNAs. RESULTS:A total of 48 genes and 2 miRNAs were differentially expressed in all the 13 cancers. Among them, 25 genes were up-regulated, 23 genes and 2 miRNAs were down-regulated. Most differentially expressed genes and miRNAs had good ability to distinguish between the cases and controls, with AUC, sensitivity and specificity up to 0.8-0.9. Survival analysis results show that differentially expressed genes and miRNAs were significantly associated with patient survival in a variety of cancers. Most up-regulated genes were risk factors for patient survival (HR>1), while most down-regulated genes were protective factors for patient survival (0 < HR < 1). The enrichment analysis of GO and KEGG showed that the differentially expressed genes were mostly enriched in biological events related to cell proliferation. In the regulatory network analysis, a total of 13 differentially expressed genes and 2 differentially expressed miRNAs had regulatory and interaction relationships. CONCLUSION:The 48 genes and 2 miRNAs that were differentially expressed in 13 cancers may serve as potential pan-cancer biomarkers, providing help for the diagnosis and prognosis evaluation of a variety of cancers, and providing clues for the development of broad-spectrum tumor therapeutic targets.

Identifying Cancer Biomarkers from High-Throughput RNA Sequencing Data by Machine Learning

Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning

Identifying and Analyzing Different Cancer Subtypes Using RNA-seq Data of Blood Platelets.

Using Machine Learning Methods to Study Colorectal Cancer Tumor Micro-Environment and Its Biomarkers.

Identification of Pan-Cancer Biomarkers Based on the Gene Expression Profiles of Cancer Cell Lines

Identification of Gene Expression in Different Stages of Breast Cancer with Machine Learning

Identification of miRNA-Mediated Subpathways as Prostate Cancer Biomarkers Based on Topological Inference in a Machine Learning Process Using Integrated Gene and miRNA Expression Data

Integrating Multi-scale Gene Features for Cancer Diagnosis.

Exploring Prognostic Gene Factors in Breast Cancer via Machine Learning

Identification of Pan-Cancer Prognostic Biomarkers Through Integration of Multi-Omics Data

Identifying the Signatures and Rules of Circulating Extracellular MicroRNA for Distinguishing Cancer Subtypes

Gene selection and classification for cancer microarray data based on machine learning and similarity measures

Identification of Differentially Expressed Genes Between Original Breast Cancer and Xenograft Using Machine Learning Algorithms

Integrative Analysis of RNA Expression Data Unveils Distinct Cancer Types through Machine Learning Techniques

Early Diagnosis of Hepatocellular Carcinoma Using Machine Learning Method

Identification of gene signatures used to recognize biological characteristics of gastric cancer upon gene expression data.

[Exploratory Screening of Potential Pan-Cancer Biomarkers Based on the Cancer Genome Atlas Database].

Tumor origin identification through machine learning and gene expression profiling.

FS–GBDT: identification multicancer-risk module via a feature selection algorithm by integrating Fisher score and GBDT

Analysis Of Expression Pattern Of Snornas In Different Cancer Types With Machine Learning Algorithms