Robinia-BLAST: an Extensible Parallel BLAST Based on Data-Intensive Distributed Computing

Yang Gu,Zhenchun Huang
DOI: https://doi.org/10.1109/dasc.2014.10
2014-01-01
Abstract:BLAST[1] (Basic Local Alignment Search Tool) is a suite of programs used to identify similarity between genetic sequences. It is one of the most widely used tools in Bioinformatics. In recent years, with the size of gene and protein sequence database increasing exponentially, BLAST has become both a data-intensive and a computation-intensive application. How to run BLAST rapidly with low cost has always been the hotspot to researchers. Parallelization is one of the most important ways to resolve this problem. In this paper, a new approach for parallelizing BLAST based on a parallel processing framework called Robinia is presented. Compared with parallel version of BLAST presented before, Robinia-based BLAST has easy public accessibility and good scalability. Most importantly, it can support operation on WAN, this make it possible to integrate computation and storage resources on Internet to service for super-large scale BLAST projects. We implemented the Robinia-based BLAST and experimented on it using two different datasets. The results show that parallel BLAST based on Robinia can achieve linear speedup based on number of used nodes with good scalability and low cost.
What problem does this paper attempt to address?