GARFIELD-NGS: Genomic vARiants FIltering by dEep Learning moDels in NGS

Viola Ravasio,Marco Ritelli,Andrea Legati,Edoardo Giacopuzzi
DOI: https://doi.org/10.1101/149146
2017-06-14
Abstract:Abstract Summary Exome sequencing approach is extensively used in research and diagnostic laboratories to discover pathological variants and study genetic architecture of human diseases. However, a significant proportion of identified genetic variants are actually false positive calls, and this pose serious challenges for variants interpretation. Here, we propose a new tool named GARFIELD-NGS (Genomic vARiants FIltering by dEep Learning moDels in NGS), which rely on deep learning models to dissect false and true variants in exome sequencing experiments performed with Illumina or ION platforms. GARFIELD-NGS showed strong performances for both SNP and INDEL variants (AUC 0.71 - 0.98) and outperformed established hard filters. The method is robust also at low coverage down to 30X and can be applied on data generated with the recent Illumina two-colour chemistry. GARFIELD-NGS processes standard VCF file and produces a regular VCF output. Thus, it can be easily integrated in existing analysis pipeline, allowing application of different thresholds based on desired level of sensitivity and specificity. Availability GARFIELD-NGS available at https://github.com/gedoardo83/GARFIELD-NGS Contact edoardo.giacopuzzi@unibs.it
What problem does this paper attempt to address?