BMI-CNV: A Bayesian framework for multiple genotyping platforms detection of copy number variation

Xizhi Luo,Guoshuai Cai,Alexander C. Mclain,Christopher I. Amos,Bo Cai,Feifei Xiao
DOI: https://doi.org/10.1101/2021.06.22.449433
2021-06-22
Abstract:Abstract Whole-exome sequencing (WES) enables detection of Copy number variations (CNVs) with high resolution in protein-coding regions. However, variations in the intergenic or intragenic regions are excluded from studies. Fortunately, samples have been previously sequenced by other genotyping platforms, such as SNP array. Moreover, conventional single sample-based methods suffer from high false discovery rate due to prominent data noise. Therefore, methods for integrating multiple genotyping platforms and samples are highly demanded for improved CNV detection. We developed BMI-CNV, a B ayesian M ulti-sample and I ntegrative CNV (BMI-CNV) profiling method with data sequenced by both WES and microarray. For the multi-sample integration, we identify the shared CNVs regions across samples using a Bayesian probit stick-breaking process model coupled with a Gaussian Mixture model estimation. With extensive simulations, BMI-CNV outperformed existing methods with remarkably improved accuracy. By applying to the matched 1000 genomes project and HapMap project data, we showed that BMI-CNV accurately detected common variants. We further applied it to The Research of International Cancer of Lung (TRICL) consortium with matched WES and OncoArray data and identified lung cancer risk associated genes in 17q11.2, 1p36.12, 8q23.1 and 5q22.2 regions, which may provide new insights into the etiology of lung cancer.
What problem does this paper attempt to address?