Modeling and Correct the GC Bias of Tumor and Normal WGS Data for SCNA Based Tumor Subclonal Population Inferring

Yanshuo Chu,Mingxiang Teng,Yadong Wang
DOI: https://doi.org/10.1186/s12859-018-2099-0
IF: 3.307
2018-01-01
BMC Bioinformatics
Abstract:Background Somatic copy number alternations (SCNAs) can be utilized to infer tumor subclonal populations in whole genome seuqncing studies, where usually their read count ratios between tumor-normal paired samples serve as the inferring proxy. Existing SCNA based subclonal population inferring tools consider the GC bias of tumor and normal sample is of the same fature, and could be fully offset by read count ratio. However, we found that, the read count ratio on SCNA segments presents a Log linear biased pattern, which influence existing read count ratios based subclonal inferring tools performance. Currently no correction tools take into account the read ratio bias. Results We present Pre-SCNAClonal, a tool that improving tumor subclonal population inferring by correcting GC-bias at SCNAs level. Pre-SCNAClonal first corrects GC bias using Markov chain Monte Carlo probability model, then accurately locates baseline DNA segments (not containing any SCNAs) with a hierarchy clustering model. We show Pre-SCNAClonal’s superiority to exsiting GC-bias correction methods at any level of subclonal population. Conclusions Pre-SCNAClonal could be run independently as well as serving as pre-processing/gc-correction step in conjuntion with exsiting SCNA-based subclonal inferring tools.
What problem does this paper attempt to address?