Title SOAP 3-dp : Fast , Accurate and Sensitive GPU-Based Short
Ruibang Luo,Thomas K. F. Wong,Jianqiao Zhu,Chi-Man Liu,Xiaoqian Zhu,E. Wu,Lap-Kei Lee,Haoxiang Lin,Wenjuan Zhu,David W. Cheung,H. Ting,S. Yiu,Shaoliang Peng,Chang Yu,Yingrui Li,Ruiqiang Li,T. Lam
2013-01-01
Abstract:To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, CUSHAW2, GEM and GPU-based aligners BarraCUDA and CUSHAW, SOAP3-dp was found to be two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60%. Real data evaluation using human genome demonstrates SOAP3-dp’s power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1% FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides the same scoring scheme as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A. Citation: Luo R, Wong T, Zhu J, Liu C-M, Zhu X, et al. (2013) SOAP3-dp: Fast, Accurate and Sensitive GPU-Based Short Read Aligner. PLoS ONE 8(5): e65632. doi:10.1371/journal.pone.0065632 Editor: Frederick C. C. Leung, University of Hong Kong, China Received February 19, 2013; Accepted April 25, 2013; Published May 31, 2013 Copyright: 2013 Luo et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: TL was partially supported by RGC General Research Fund 10612042. R. Luo was partially supported by Hong Kong ITF Grant GHP/011/12. Both R. Luo and TL are partially supported by the GRF Grant HKU-713512E. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: liyr@genomics.cn (YL); lirq@pku.edu.cn (RL); twlam@cs.hku.hk (TL) . These authors contributed equally to this work.