A generic pipeline for CADD score generation: chickenCADD and turkeyCADD

Kim Lensing,Job G.C. van Schipstal,Dick de Ridder,Martien Groenen,Martijn Derks
DOI: https://doi.org/10.1101/2024.11.01.621569
2024-11-03
Abstract:Combined Annotation Dependent Depletion (CADD) is a machine learning approach used to predict the deleteriousness of genetic variants across a genome. By integrating diverse genomic features, CADD assigns a PHRED-like rank score to each potential variant. Unlike other methods, CADD does not rely on limited datasets of known pathogenic or benign variants but uses larger and less biased training sets. The rapid increase in high-quality genomes and functional annotations across species highlights the need for an automated, non-species-specific pipeline to generate CADD scores. Here, we introduce such a pipeline, facilitating the generation of CADD scores for various species using only a high-quality genome with gene annotation and a multi-species alignment. Additionally, we present updated chickenCADD scores and newly generated turkeyCADD scores, both generated with the pipeline.
Genomics
What problem does this paper attempt to address?