Extracting and calibrating population evidence of variant pathogenicity using biobank data

Vineel Bhat,Tian Yu,Lara Brown,Vikas Pejaver,Matthew Lebo,Steven Harrison,Christopher A. Cassa
DOI: https://doi.org/10.1101/2024.08.14.24311911
2024-08-20
Abstract:Advancing genomic medicine relies on our ability to assess the phenotypic impacts of rare germline variants, which remains challenging even in highly sequenced monogenic disease genes. Here, we evaluate the use of population sequencing data from the UK Biobank to identify variants which alter disease risk, focusing on familial hypercholesterolemia (FH), hereditary breast and ovarian cancer syndrome (HBOC), and Lynch syndrome (CRC). We model evidence of pathogenicity from population data at the variant level, and demonstrate that odds ratios generated from population cohort data can significantly separate ClinVar pathogenic and benign variants in FH genes (p = 4.5x10-19), HBOC genes (p = 2.5x10-39), and CRC genes (p = 7.6x10-16). Next, to make use of this information in variant assessment, we calibrate population-based odds ratios (ACMG/AMP PS4) at the gene level, and find that they reach 'strong' or 'very strong' evidence of pathogenicity in 8 of 11 genes, as well as in aggregate. Among participants with a rare variant in these 8 genes, 4.3% (N = 2,456) have a Variant of Uncertain Significance (VUS) or variant not yet observed in ClinVar with strong population evidence of pathogenicity that could inform variant interpretation for a related disorder. In three genes with functional assays, we combine this population evidence with computational, contextual, and experimental evidence. Notably, 12.4% of LDLR VUS seen in participants have sufficient evidence to be classified as pathogenic. This method offers a scalable approach to integrate evidence of pathogenicity from population data.
What problem does this paper attempt to address?