Precise diagnosis of three top cancers using dbGaP data

Xu-Qing Liu,Xin-Sheng Liu,Jian-Ying Rong,Feng Gao,Yan-Dong Wu,Chun-Hua Deng,Hong-Yan Jiang,Xiao-Feng Li,Ye-Qin Chen,Zhi-Guo Zhao,Yu-Ting Liu,Hai-Wen Chen,Jun-Liang Li,Yu Huang,Cheng-Yao Ji,Wen-Wen Liu,Xiao-Hu Luo,Li-Li Xiao
DOI: https://doi.org/10.1038/s41598-020-80832-x
IF: 4.6
2021-01-12
Scientific Reports
Abstract:Abstract The challenge of decoding information about complex diseases hidden in huge number of single nucleotide polymorphism (SNP) genotypes is undertaken based on five dbGaP studies. Current genome-wide association studies have successfully identified many high-risk SNPs associated with diseases, but precise diagnostic models for complex diseases by these or more other SNP genotypes are still unavailable in the literature. We report that lung cancer, breast cancer and prostate cancer as the first three top cancers worldwide can be predicted precisely via 240–370 SNPs with accuracy up to 99% according to leave-one-out and 10-fold cross-validation. Our findings (1) confirm an early guess of Dr. Mitchell H. Gail that about 300 SNPs are needed to improve risk forecasts for breast cancer, (2) reveal an incredible fact that SNP genotypes may contain almost all information that one wants to know, and (3) show a hopeful possibility that complex diseases can be precisely diagnosed by means of SNP genotypes without using phenotypical features. In short words, information hidden in SNP genotypes can be extracted in efficient ways to make precise diagnoses for complex diseases.
multidisciplinary sciences
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use single nucleotide polymorphism (SNP) genotype data to accurately diagnose the three most common cancers: lung cancer, breast cancer, and prostate cancer. Although current genome - wide association studies (GWAS) have successfully identified many high - risk SNPs related to diseases, the construction of accurate diagnostic models based on these or more other SNP genotypes is still lacking in the literature. By using the data of five studies in the dbGaP database, this paper shows how to achieve high - precision prediction of these three cancers with 240 to 370 SNPs, with an accuracy rate of up to 99%. In addition, the study also reveals that SNP genotypes may contain almost all information about complex diseases, and raises the hope of accurately diagnosing complex diseases through SNP genotypes rather than phenotypic characteristics.