Sap-A Sequence Mapping And Analyzing Program For Long Sequence Reads Alignment And Accurate Variants Discovery

Zheng Sun,Weidong Tian
DOI: https://doi.org/10.1371/journal.pone.0042887
IF: 3.7
2012-01-01
PLoS ONE
Abstract:The third-generation of sequencing technologies produces sequence reads of 1000 bp or more that may contain high polymorphism information. However, most currently available sequence analysis tools are developed specifically for analyzing short sequence reads. While the traditional Smith-Waterman (SW) algorithm can be used to map long sequence reads, its naive implementation is computationally infeasible. We have developed a new Sequence mapping and Analyzing Program (SAP) that implements a modified version of SW to speed up the alignment process. In benchmarks with simulated and real exon sequencing data and a real E. coli genome sequence data generated by the third-generation sequencing technologies, SAP outperforms currently available tools for mapping short and long sequence reads in both speed and proportion of captured reads. In addition, it achieves high accuracy in detecting SNPs and InDels in the simulated data. SAP is available at https://github.com/davidsun/SAP.
What problem does this paper attempt to address?