[Bioinformatics of Tumor Molecular Targets from Big Data].

Jinyan Huang,Yingyan Yu
DOI: https://doi.org/10.3760/cma.j.issn.1671-0274.2015.01.003
2015-01-01
Abstract:The big data from high throughput research disclosed 4V features: volume of data, variety of data, value for deep mining, and velocity of processing speed. Regarding the whole genome sequencing for human sample, at average 30x of coverage, a total of 100 GB of original data (compression FASTQ format) could be produced. Replying to the binary BAM format, a total of 150 GB data could be produced. In the analysis of high throughput data, we need to combine both clinical information and pathological features. In addition, the data sources of medical research involved in ethical and privacy of patients. At present, the costs are gradually cheaper. For example, a whole genome sequencing by Illumina X Ten with 30x coverage costs about 10,000 RMB, and RNA-seq costs 5000 RMB for a single sample. Therefore, cancer genome research provides opportunities for discovery of molecular targets, but also brings enormous challenges on the data integration and utilization. This article introduces methodologies for high throughput data analysis and processing, and explains possible application on molecular target discovery.
What problem does this paper attempt to address?