Abstract:Biological Sequence alignment is a fundamental application in bioinformatics. It can be used to identify functionally conserved sequences and find evolutionary relationships between species. To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. Global alignments are important because they reveal the shared order of biological features in the compared species, and produce a more accurate alignment at the base-pair level when the features are in the same order. The best known global alignment algorithm is Needleman-Wunsch, later, BitPAl, a bit parallel algorithm for general, integer scoring global algorithm, provides a new implementation of Needleman-Wunsch algorithm (BitNW). Compared with original Needleman-Wunsch algorithm, BitNW is significantly faster by exploiting bit parallelism. A number of parallel strategies have been proposed to accelerate exact alignment methods. However, most of them failed to align long biological sequences due to quadratic time complexity. In this paper, we propose SLPal, a fast bit-parallel algorithm for accelerating long DNA sequence comparison on Intel many-core and multi-core architectures. In order to fully exploit the computing power of many cores and the 512-bit vector processing units (VPUs), we use a two-level parallelism scheme: coarse-grained thread level and fine-grained VPU level approaches. In thread level, the alignment scoring matrix will be split into small tiles and multiple threads will process these small tiles currently by using Intel TBB library. In the VPU level, the computing kernels are implemented using the Single Instruction Multiple Data (SIMD) instructions, thus, 16 independent integers reside in a 512-bit vector register can be processed simultaneously. The evaluation reveals that our algorithm achieves a stable performance for all benchmark data and yields a performance of up to 511.7 (617.2) GCUPS on a server with single Xeon Phi 7210 processor (dual Xeon Gold 6148 20-core processors). Furthermore, our test shows that SLPal can align two sequences with about 5 million bps in 50 seconds on our server equipped with dual Xeon Gold 6148 CPUs.

diBELLA: Distributed Long Read to Long Read Alignment

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly

Gene Sequence Alignment on a Public Computing Platform

MinimapR: A Parallel Alignment Tool for the Analysis of Large-Scale Third-Generation Sequencing Data

Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly

Parallel Accelerating Ultra-Long Read Alignment by Vertical Partitioning Data

SLPal: Accelerating Long Sequence Alignment on Many-Core and Multi-Core Architectures

DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

mBWA: A Massively Parallel Sequence Reads Aligner.

Aligner-D: Leveraging In-DRAM Computing to Accelerate DNA Short Read Alignment

Parallel Algorithms for Large-Scale Biological Sequence Alignment on Xeon-Phi Based Clusters

HiPGA: A High Performance Genome Assembler for Short Read Sequence Data

Distributed Sequence Alignment Applications for the Public Computing Architecture

SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner.

GenMPI: Cluster Scalable Variant Calling for Short/Long Reads Sequencing Data

HAlign-II: Efficient Ultra-Large Multiple Sequence Alignment and Phylogenetic Tree Reconstruction with Distributed and Parallel Computing

Sensitive Long-Indel-Aware Alignment of Sequencing Reads

DDP-B - A Distributed Dynamic Parallel Framework for Meta-genomics Binary Similarity.

Parallel Algorithm for Multiple Genome Alignment on the Grid Environment

An efficient Burrows-Wheeler transform-based aligner for short read mapping

Efficient Distributed Parallel Aligning Reads and Reference Genome with Many Repetitive Subsequences Using Compact de Bruijn Graph