Identifying deleterious noncoding variation through gain and loss of CTCF binding activity

Colby Tubbs,Mary Lauren Benton,Evonne McArthur,John A. Capra,Douglas M. Ruderfer
DOI: https://doi.org/10.1101/2024.09.04.609712
2024-09-08
Abstract:Noncoding single nucleotide variants are the predominant class of genetic variation in whole genome sequencing and are key drivers of phenotypic variation. However, their functional annotation remains challenging. To address this, we develop a hypothesis-driven functional annotation scheme for CTCF binding sites given CTCFs critical roles in gene regulation and extensive profiling in regulatory datasets. We synthesize CTCFs binding patterns at 1,063,879 genomic loci across 214 biological contexts into a summary metric, which we refer to as binding activity. We find that binding activity is significantly enriched for both conserved nucleotides (Pearson R = 0.31, p < 2.2 x 10-16) and sequences that contain high-quality CTCF binding motifs (Pearson R = 0.63, p = 2.9 x 10-12). We then integrate binding activity with high confidence change in precision weight matrix scores. By applying this framework to 1,253,330 SNVs in gnomAD, we explore signatures of selection acting against the disruption of CTCF binding. We find a strong, positive relationship between the mutability adjusted proportion of singletons (MAPS) metric and the loss of CTCF binding at loci with high in vitro activity (Pearson R = 0.67, p = 1.5 x 10-14). To contextualize these findings, we apply MAPS to other functional classes of variation and find that a subset of 198,149 loss of CTCF binding variants are observed as infrequently as missense variants. This work implicates these thousands of rare, noncoding variants that disrupt CTCF binding for further functional studies while providing a blueprint for the interpretable annotation of noncoding variants.
Genetics
What problem does this paper attempt to address?