Abstract:BACKGROUND:Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators.RESULTS:This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency.CONCLUSIONS:Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .

Parallel Local Alignment Algorithm for Multiple Sequences on Heterogeneous Cluster Systems

Parallel Algorithm for Pair-Wise Sequence Global Alignment on Heterogeneous Cluster Systems

Gene Sequence Alignment on a Public Computing Platform

Parallel Multiple Sequences Alignment in SMP Cluster

On-Line Scheduling of Parallel Jobs in Heterogeneous Multiple Clusters

A Data Parallel Strategy for Aligning Multiple Biological Sequences on Homogeneous Multiprocessor Platform

A data parallel strategy for aligning multiple biological sequences on multi-core computers

Parallel Algorithms for Large-Scale Biological Sequence Alignment on Xeon-Phi Based Clusters

An efficient parallel algorithm for multiple sequence similarities calculation using a low complexity method.

Parallel Algorithms for Approximate String Matching on Heterogeneous Cluster Computing Systems

Parallel Algorithm for Long Sequences Maximal Tandem Repeats on the Cluster Computing Systems

A high-throughput gene sequence alignment strategy using parallel computing

Cluster-Distribute-Align-Merge: A General Algorithm to Speed Up Multiple Sequence Alignment on Multi-Core Computers

Parallel sequence alignment in a P2P-based high performance computing platform.

Parallel Algorithm for Approximate Multiple Object Strings Matching on Heterogeneous Cluster Computing Systems with Limited Memory

A Parallel Clustering Algorithm Using Mapping and Sampling-Partitioning on the Cluster Computing Systems

Accelerating Alignment for Short Reads Allowing Insertion of Gaps on Multi-Core Cluster

Efficient and Scalable Parallel Algorithm for Motif Finding on Heterogeneous Cluster Systems

Parallel Algorithms for Approximate String Matching with Multi-Round Distribution Strategy on Heterogeneous Cluster Computing Systems

Parallel Accelerating Ultra-Long Read Alignment by Vertical Partitioning Data

A Novel Fast And Memory Efficient Parallel Mlcs Algorithm For Long And Large-Scale Sequences Alignments