Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture

Alexander Sczyrba,Abhishek Pratap,Shane Canon,James Han,Alex Copeland,Zhong Wang,Tony Brewer,David E. Soper,Mike D'Jamoos,Kirby Collins,George Vacek
2011-01-01
Abstract:Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86 servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models. JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.
What problem does this paper attempt to address?