Improved diagnosis of rare disease patients through application of constrained coding region annotation and de novo status.

Hywel John Williams,Chris Odhams,Genomics England Research Consortium
DOI: https://doi.org/10.1101/2022.08.19.22278944
2024-06-03
Abstract:Identifying the pathogenic variant in a rare disease (RD) patient is the first step in ending their diagnostic odyssey. De novo (Dn) variants affecting protein-coding DNA are a well-established cause of Mendelian disorders in RD patients. Constrained coding regions (CCRs) are specific segments of coding DNA which are devoid of functional variants in healthy individuals. Furthermore, the most constrained regions, those in percentile bin >95 (CCR95), are significantly enriched for functional pathogenic variants and could therefore be useful for clinical variant prioritisation. We aimed to evaluate the diagnostic utility of incorporating Dn, CCR95 and Dn_CCR95 status into the variant prioritisation cascade for RD patients that have undergone genomic sequencing. Using data from the Genomics England 100,000 Genomes Project v12, we selected 3,090 trios that have undergone diagnostic evaluation and been analysed with an advanced Dn identification pipeline. For this analysis we have excluded all non-autosomal variants. Our analysis shows the diagnostic rate increased from 71% in the full cohort to 81% when evaluating just the CCR95 variants, 84% for Dn variants and 87% for Dn CCR95 variants. Of note, manual evaluation of the Dn CCR95 variants from the undiagnosed patients revealed a putative diagnosis in 69% of patients (27 of 39), with clinical follow up resulting in a diagnosis for a further 11 patients. This takes the overall diagnostic rate for Dn CCR95 variants 90% and suggests application of this metric can prioritise diagnostic variants in undiagnosed patients. We also identify a striking enrichment of signal in patients with a phenotype of neurology and neurodevelopmental disorders, whereby their diagnostic rate increases from 60% in the whole cohort to 71%, 73% and 74% in the Dn, CCR95 and Dn CCR95 categories respectively. In summary, we demonstrate the potential clinical utility of performing bespoke Dn analyses of RD patients and for incorporating CCR information into the filtering cascade to prioritise pathogenic variants. We believe such a strategy will aid the identification of pathogenic variants and decrease the time taken to make a diagnosis, thus increasing the overall diagnostic rate by allowing more samples to be analysed over the same time period.
What problem does this paper attempt to address?