Comprehensive analysis of the genetic variation in the gene from short-read sequencing

Raphael O. Betschart,Georgios Koliopanos,Paras Garg,Linlin Guo,Massimiliano Rossi,Sebastian Schönherr,Stefan Blankenberg,Raphael Twerenbold,Tanja Zeller,Andreas Ziegler
DOI: https://doi.org/10.1101/2024.03.21.24304527
2024-03-22
Abstract:Lipoprotein (a) [LP(a)] is a risk factor for cardiovascular diseases and mainly regulated by the complex LPA gene. We investigated the types of variation in the LPA gene and their predictive performance on LP(a) concentration. We determined the Kringle IV-type 2 (KIV-2) copy number (CN) using the DRAGEN LPA Caller (DLC) and a read-depth based CN estimator in 8351 whole genome sequencing samples from the GENESIS-HD study. The pentanucleotide repeat in the promoter region was genotyped with GangSTR and ExpansionHunter. LP(a) concentration was available in 4861 population-based subjects. Predictive performance on LP(a) concentration was investigated using random forests. The agreement of the KIV-2 CN between the two specialized callers was high (r=0.9966; 95% confidence interval [CI] 0.9965–0.9968). Allele-specific KIV-2 CN could be determined in 47.0% of the subjects using the DLC. Lp(a) concentration can be better predicted from allele-specific KIV-2 CN than total KIV-2 CN. Two single nucleotide variants 4925G>A and rs41272114 further improved prediction. The genetically complex LPA gene can be analyzed with excellent agreement between different callers. The allele-specific KIV-2 CN is more important for predicting LP(a) concentration than the total KIV-2 CN. It would be important that the allele-specific KIV-2 CN is determinable in all subjects.
Genetic and Genomic Medicine
What problem does this paper attempt to address?