Abstract:Background: There is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing (HPC) environment with a greatly extended data storage capacity. Results: We designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory (RAM) and a distributed file-allocation system for distance matrix construction and pair-align computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing algorithm, called "idle node-seeking task algorithm" (INSTA). The new editing option and the graphical user interface (GUI) provide ready access to a parallel-computing environment for users who seek fast and easy alignment of large DNA and protein sequence sets. Conclusions: ClustalXeed can now compute a large volume of biological sequence data sets, which were not tractable in any other parallel or single MSA program. The main developments include: 1) the ability to tackle larger sequence alignment problems than possible with previous systems through markedly improved storage-handling capabilities. 2) Implementing an efficient task load-balancing algorithm, INSTA, which improves overall processing times for multiple sequence alignment with input sequences of non-uniform length. 3) Support for both single PC and distributed cluster systems.

Distributed Sequence Alignment Applications for the Public Computing Architecture

Gene Sequence Alignment on a Public Computing Platform

ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive Bioinformatics Analysis

A Distributed Parallel Computing Environment for Bioinformatics Problems

MUSIC: A Hybrid Computing Environment for Burrows-Wheeler Alignment for Massive Amount of Short Read Sequence Data

Massively Parallel Implementation of Sequence Alignment with Basic Local Alignment Search Tool Using Parallel Computing in Java Library

The accelerating implementation of BLAST with stream processor

Memory Efficient Pair-Wise Genome Alignment Algorithm - A Small-Scale Application With Grid Potential

GPU Accelerated Biological Sequence Alignment

A high-throughput gene sequence alignment strategy using parallel computing

Parallel Algorithm for Multiple Genome Alignment on the Grid Environment

Mega-base Biological Sequence Alignment Targeting OpenCL Architecture

Exploiting Parallelization of BLAST on Dawning 4000A

DSA: Scalable Distributed Sequence Alignment System Using SIMD Instructions.

Parallel Multiple Sequences Alignment in SMP Cluster

An efficient parallel algorithm for multiple sequence similarities calculation using a low complexity method.

diBELLA: Distributed Long Read to Long Read Alignment

Robinia-BLAST: an Extensible Parallel BLAST Based on Data-Intensive Distributed Computing

ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment

DNA sequence alignment: An assignment for OpenMP, MPI, and CUDA/OpenCL

An algorithm for DNA read alignment on quantum accelerators