TNMplot.com: a web tool for the comparison of gene expression in normal, tumor and metastatic tissues

Áron Bartha,Balázs Győrffy
DOI: https://doi.org/10.1101/2020.11.10.376228
2020-11-11
Abstract:ABSTRACT Genes showing higher expression in either tumor or metastatic tissues can help in better understanding tumor formation, and can serve as biomarkers of progression or as therapy targets with minimal off-target effects. Our goal was to establish an integrated database using available transcriptome-level datasets and to create a web-platform enabling mining of this database by comparing normal, tumor and metastatic data across all genes in real time. We utilized data generated by either gene arrays or RNA-seq. Gene array data were manually selected from NCBI-GEO. RNA sequencing data was downloaded from the TCGA, TARGET, and GTEx repositories. TCGA and TARGET contain predominantly tumor and metastatic samples from adult and pediatric patients, while GTEx samples are from healthy tissues. Statistical significance was computed using Mann-Whitney or Kruskall-Wallis tests. The entire database contains 56,938 samples including 33,520 samples from 3,180 gene chip-based studies (453 metastatic, 29,376 tumorous and 3,691 normal samples), 11,010 samples from TCGA (394 metastatic, 9,886 tumorous and 730 normal), 1,193 samples from TARGET (1 metastatic, 1,180 tumor, 12 normal) and 11,215 normal samples from GTEx. The most consistently up-regulated genes across multiple tumor types were TOP2A (mean FC=7.8), SPP1 (FC=7.0) and CENPA (FC=6.03) and the most consistently down-regulated gene was ADH1B (mean FC=0.15). Validation of differential expression using equally sized training and test sets confirmed reliability of the database in breast, colon, and lung cancer (p<0.0001). The online analysis platform enables unrestricted mining of the database and is accessible at www.tnmplot.com .
What problem does this paper attempt to address?