GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data

Mehrtash Babadi,Jack M. Fu,Samuel K. Lee,Andrey N. Smirnov,Laura D. Gauthier,Mark Walker,David I. Benjamin,Xuefang Zhao,Konrad J. Karczewski,Isaac Wong,Ryan L. Collins,Alba Sanchis-Juan,Harrison Brand,Eric Banks,Michael E. Talkowski
DOI: https://doi.org/10.1038/s41588-023-01449-0
IF: 30.8
2023-08-22
Nature Genetics
Abstract:Copy number variants (CNVs) are major contributors to genetic diversity and disease. While standardized methods, such as the genome analysis toolkit (GATK), exist for detecting short variants, technical challenges have confounded uniform large-scale CNV analyses from whole-exome sequencing (WES) data. Given the profound impact of rare and de novo coding CNVs on genome organization and human disease, we developed GATK-gCNV, a flexible algorithm to discover rare CNVs from sequencing read-depth information, complete with open-source distribution via GATK. We benchmarked GATK-gCNV in 7,962 exomes from individuals in quartet families with matched genome sequencing and microarray data, finding up to 95% recall of rare coding CNVs at a resolution of more than two exons. We used GATK-gCNV to generate a reference catalog of rare coding CNVs in WES data from 197,306 individuals in the UK Biobank, and observed strong correlations between per-gene CNV rates and measures of mutational constraint, as well as rare CNV associations with multiple traits. In summary, GATK-gCNV is a tunable approach for sensitive and specific CNV discovery in WES data, with broad applications.
genetics & heredity
What problem does this paper attempt to address?