Automated Identification of Germline Mutations in Family Trios: A Consensus-Based Informatic Approach
Mariya Shadrina,Özem Kalay,Sinem Demirkaya-Budak,Charles A. LeDuc,Wendy K. Chung,Deniz Turgut,Gungor Budak,Elif Arslan,Vladimir Semenyuk,Brandi Davis-Dusenbery,Christine E. Seidman,H. Joseph Yost,Amit Jain,Bruce D. Gelb
DOI: https://doi.org/10.1101/2024.03.08.584100
2024-03-13
Abstract:Accurate identification of germline variants (DNVs) remains a challenging problem despite rapid advances in sequencing technologies as well as methods for the analysis of the data they generate, with putative solutions often involving filters and visual inspection of identified variants. Here, we present a purely informatic method for the identification of DNVs by analyzing short-read genome sequencing data from proband-parent trios. Our method evaluates variant calls generated by three genome sequence analysis pipelines utilizing different algorithms—GATK HaplotypeCaller, DeepTrio and Velsera GRAF—exploring the assumption that a requirement of consensus can serve as an effective filter for high- quality DNVs. We assessed the efficacy of our method by testing DNVs identified using a previously established, highly accurate classification procedure that partially relied on manual inspection and used Sanger sequencing to validate a DNV subset comprising less confident calls. The results show that our method is highly precise and that applying a force-calling procedure to putative variants further removes false-positive calls, increasing precision of the workflow to 99.6%. Our method also identified novel DNVs, 87% of which were validated, indicating it offers a higher recall rate without compromising accuracy. We have implemented this method as an automated bioinformatics workflow suitable for large- scale analyses without need for manual intervention.
Bioinformatics