Abstract:Currently, there are many publicly available Next Generation Sequencing tools developed for variant annotation and classification. However, as modern sequencing technology produces more and more sequencing data, a more efficient analysis program is desired, especially for variant analysis. In this study, we updated SNPAAMapper, a variant annotation pipeline by converting perl codes to python for generating annotation output with an improved computational efficiency and updated information for broader applicability. The new pipeline written in Python can classify variants by region (Coding Sequence, Untranslated Regions, upstream, downstream, intron), predict amino acid change type (missense, nonsense, etc.), and prioritize mutation effects (e.g., synonymous > non-synonymous) while being faster and more efficient. Our new pipeline works in five steps. First, exon annotation files are generated. Next, the exon annotation files are processed, and gene mapping and feature information files are produced. Afterward, the python scrips classify the variants based on genomic regions and predict the amino acid change category. Lastly, another python script prioritizes and ranks the mutation effects of variants to output the result file. The Python version of SNPAAMapper accomplished the overall speed by running most annotation steps in a substantially shorter time. The Python script can classify variants by region in 53 s compared to 166 s for the Perl script in a test sample run on a Latitude 7480 Desktop computer with 8GB RAM and an Intel Core i5-6300 CPU @ 2.4Ghz. Steps of predicting amino acid change type and prioritizing mutation effects of variants were executed within 1 s for both pipelines. SNPAAMapper-Python was developed and tested on the ClinVar database, a NCBI database of information on genomic variation and its relationship to human health. We believe our developed Python version of SNPAAMapper variant annotation pipeline will benefit the community by elucidating the variant consequence and speed up the discovery of causative genetic variants through whole genome/exome sequencing. Source codes, test data files, instructions, and further explanations are available on the web at https://github.com/BaiLab/SNPAAMapper-Python.

SNPAAMapper-Python: A highly efficient genome-wide SNP variant analysis pipeline for Next-Generation Sequencing data

SNPMap—An Integrated Visual SNP Interpretation Tool

Variant analysis pipeline for accurate detection of genomic variants from transcriptome sequencing data

Onkopipe: A Snakemake Based DNA-Sequencing Pipeline for Clinical Variant Analysis in Precision Medicine

DNAscan: a fast, computationally and memory efficient bioinformatics pipeline for the analysis of DNA next-generation-sequencing data

BnaSNPDB: an Interactive Web Portal for the Efficient Retrieval and Analysis of SNPs among 1,007 Rapeseed Accessions

DNAscan2: a versatile, scalable, and user-friendly analysis pipeline for human next-generation sequencing data

vSNP: a SNP pipeline for the generation of transparent SNP matrices and phylogenetic trees from whole genome sequencing data sets

NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data

Ingap: an Integrated Next-Generation Genome Analysis Pipeline

Metapipeline-DNA: A Comprehensive Germline & Somatic Genomics Nextflow Pipeline

An Integrated Framework for Analysis and Prediction of Impact of Single Nucleotide Polymorphism Associated with Human Diseases

SNPnotes: high-throughput tissue-specific functional annotation of single nucleotide variants

Snpdetector: A Software Tool for Sensitive and Accurate Snp Detection

Anaconda: AN Automated Pipeline for Somatic COpy Number Variation Detection and Annotation from Tumor Exome Sequencing Data

PhD-SNPg: a webserver and lightweight tool for scoring single nucleotide variants

The GenoPred Pipeline: A Comprehensive and Scalable Pipeline for Polygenic Scoring.

snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data

MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads

GenMPI: Cluster Scalable Variant Calling for Short/Long Reads Sequencing Data

Rare copy number variant analysis in case-control studies using snp array data: a scalable and automated data analysis pipeline