Abstract:Background: Protein structure comparison is one of the most important problems in computational biology and plays a key role in protein structure prediction, fold family classification, motif finding, phylogenetic tree reconstruction and protein docking. Results: We propose a novel method to compare the protein structures in an accurate and efficient manner. Such a method can be used to not only reveal divergent evolution, but also identify circular permutations and further detect active-sites. Specifically, we define the structure alignment as a multi-objective optimization problem, i.e., maximizing the number of aligned atoms and minimizing their root mean square distance. By controlling a single distance-related parameter, theoretically we can obtain a variety of optimal alignments corresponding to different optimal matching patterns, i.e., from a large matching portion to a small matching portion. The number of variables in our algorithm increases with the number of atoms of protein pairs in almost a linear manner. In addition to solid theoretical background, numerical experiments demonstrated significant improvement of our approach over the existing methods in terms of quality and efficiency. In particular, we show that divergent evolution, circular permutations and active-sites (or structural motifs) can be identified by our method. The software SAMO is available upon request from the authors, or from http://zhangroup.aporc.org/bioinfo/samo/ and http://intelligent.eic.osaka-sandai.ac.jp/chenen/samo.htm. Conclusion: A novel formulation is proposed to accurately align protein structures in the framework of multi-objective optimization, based on a sequence order-independent strategy. A fast and accurate algorithm based on the bipartite matching algorithm is developed by exploiting the special features. Convergence of computation is shown in experiments and is also theoretically proven.

A seriate coverage filtration approach for homology search.

A Fast Exact Pattern Matching Algorithm for Biological Sequences

Detailed Assessment of Homology Detection Using Different Substitution Matrices

Combined alignments of sequences and domains characterize unknown proteins with remotely related protein search PSISearch2D

Survey on Index Based Homology Search Algorithms

An Approach to Semantic Query Expansion System Based on Hepatitis Ontology

Detecting the homology of DNA-sequences based on the variety of optimal alignments: a case study

Protein language model powers accurate and fast sequence search for remote homology

Exploring large protein sequence space through homology- and representation-based hierarchical clustering

PLMSearch: Protein language model powers accurate and fast sequence search for remote homology

An efficient parallel algorithm for multiple sequence similarities calculation using a low complexity method.

Searching by Index for Similar Sequences: the SEQR Algorithm

DRESS: dimensionality reduction for efficient sequence search

Small Coupling Expansion for Multiple Sequence Alignment

UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches

Comparative Gene Prediction Based on Gene Structure Conservation.

Fast, sensitive detection of protein homologs using deep dense retrieval

diverse-seq: an application for alignment-free selecting and clustering biological sequences

Revealing divergent evolution, identifying circular permutations and detecting active-sites by protein structure comparison

Statistical inference of chromosomal homology based on gene colinearity and applications to Arabidopsis and rice

Can we beat the prefix filtering?: an adaptive framework for similarity join and search.