Abstract:Background: Protein structure comparison is one of the most important problems in computational biology and plays a key role in protein structure prediction, fold family classification, motif finding, phylogenetic tree reconstruction and protein docking. Results: We propose a novel method to compare the protein structures in an accurate and efficient manner. Such a method can be used to not only reveal divergent evolution, but also identify circular permutations and further detect active-sites. Specifically, we define the structure alignment as a multi-objective optimization problem, i.e., maximizing the number of aligned atoms and minimizing their root mean square distance. By controlling a single distance-related parameter, theoretically we can obtain a variety of optimal alignments corresponding to different optimal matching patterns, i.e., from a large matching portion to a small matching portion. The number of variables in our algorithm increases with the number of atoms of protein pairs in almost a linear manner. In addition to solid theoretical background, numerical experiments demonstrated significant improvement of our approach over the existing methods in terms of quality and efficiency. In particular, we show that divergent evolution, circular permutations and active-sites (or structural motifs) can be identified by our method. The software SAMO is available upon request from the authors, or from http://zhangroup.aporc.org/bioinfo/samo/ and http://intelligent.eic.osaka-sandai.ac.jp/chenen/samo.htm. Conclusion: A novel formulation is proposed to accurately align protein structures in the framework of multi-objective optimization, based on a sequence order-independent strategy. A fast and accurate algorithm based on the bipartite matching algorithm is developed by exploiting the special features. Convergence of computation is shown in experiments and is also theoretically proven.

Efficient Algorithms for Regular Expression Constrained Sequence Alignment

On the Complexity of Constrained Sequences Alignment Problems.

Constrained Pairwise and Center-Star Sequences Alignment Problems

Efficient Algorithms for Finding a Longest Common Increasing Subsequence

Constrained Sequence Alignment: A Dedicated Version and Its Applications

A Fast Exact Pattern Matching Algorithm for Biological Sequences

Efficient Parallel Algorithm for Optimal Three-Sequences Alignment

Constrained Sequence Alignment: A General Model and the Hardness Results

Algorithms For Loosely Constrained Multiple Sequence Alignment

Parallel Three-sequence Alignment with Space-efficient

Parallel linear space algorithm for large-scale sequence alignment

Grouping of Amino Acids and Recognition of Protein Structurally Conserved Regions by Reduced Alphabets of Amino Acids

Constrained Multiple Sequence Alignment Tool Development and Its Application to RNase Family Alignment

An algorithm for rapid noncoding RNA sequence-structure alignment

Sequence alignment using large protein structure alphabets improves sensitivity to remote homologs

A Fast Longest Common Subsequence Algorithm for Biosequences Alignment

Revealing divergent evolution, identifying circular permutations and detecting active-sites by protein structure comparison

RE-MuSiC: a tool for multiple sequence alignment with regular expression constraints.

Aligning biological sequences by exploiting residue conservation and coevolution

Some Parallel Approximation Algorithms for Multiple Sequence Alignment Problem

Maximum Match Subsequence Alignment Algorithm Finely Grained (MMSAA FG)