Genome analysis EPGA 2 : memory-efficient de novo assembler

Junwei Luo,Jianxin Wang,Weilong Li,Zhen Zhang,Fangjun Wu,Min Li,Yi Pan
2015-01-01
Abstract:Motivation: In genome assembly, as coverage of sequencing and genome size growing, most current softwares require a large memory for handling a great deal of sequence data. However, most researchers usually cannot meet the requirements of computing resources which prevent most current softwares from practical applications. Results: In this article, we present an update algorithm called EPGA2, which applies some new modules and can bring about improved assembly results in small memory. For reducing peak memory in genome assembly, EPGA2 adopts memory-efficient DSK to count K-mers and revised BCALM to construct De Bruijn Graph. Moreover, EPGA2 parallels the step of Contigs Merging and adds Errors Correction in its pipeline. Our experiments demonstrate that all these changes in EPGA2 are more useful for genome assembly. Availability and implementation: EPGA2 is publicly available for download at https://github.com/ bioinfomaticsCSU/EPGA2. Contact: jxwang@csu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.
What problem does this paper attempt to address?