Abstract:Advances in computational structure prediction will vastly augment the hundreds of thousands of currently-available protein complex structures. Translating these into discoveries requires aligning them, which is computationally prohibitive. Foldseek-Multimer computes complex alignments from compatible chain-to-chain alignments, identified by efficiently clustering their superposition vectors. Foldseek-Multimer is 3-4 orders of magnitudes faster than the gold standard, while producing comparable alignments; allowing it to compare billions of complex-pairs in 11 hours. Foldseek-Multimer is open-source software: https://github.com/steineggerlab/foldseek, webserver: https://search.foldseek.com and the BFMD database.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to compare and contrast a large number of protein complex structures efficiently and sensitively. With the progress of computational structure prediction techniques, thousands or even millions of protein complex structures will be available in the future. In order to discover new knowledge from these structures, a fast and accurate method for large - scale structure alignment is required. ### Specific problems 1. **Computational complexity**: - Although traditional protein complex alignment methods (such as US - align) are accurate, their computational cost is extremely high, making it difficult to perform efficient searches in large - scale databases. 2. **Structural diversity**: - It is necessary to be able to quantify the structural diversity of protein complexes and identify structural similarities and changes between different conformations or homologues. 3. **Functional understanding**: - Many proteins function in the form of complexes, so understanding their structures is crucial for revealing the functions of proteins. 4. **Alignment under low sequence similarity**: - In the case of low sequence similarity, it is still necessary to be able to effectively discover structurally similar protein complexes. ### Solutions To solve the above problems, the researchers developed Foldseek - Multimer. This tool achieves efficient and sensitive protein complex alignment in the following ways: 1. **Fast chain - to - chain alignment**: - Use Foldseek for fast single - chain protein structure alignment, which greatly improves the alignment speed. 2. **Superposition vector representation**: - Represent chain - to - chain alignments as superposition vectors, and use efficient clustering algorithms (such as DBSCAN) to identify sets of compatible chain - to - chain alignments. 3. **Utilize clustering databases**: - Utilize clustering databases during the search process to reduce redundant calculations and further accelerate the search process. Through these innovations, Foldseek - Multimer can achieve a speed increase of 3 - 4 orders of magnitude while maintaining an accuracy comparable to existing gold standards (such as US - align). This enables it to handle the alignment tasks of billions of complex pairs in a short time. ### Experimental verification The paper conducted a benchmark test on 931 pairs of protein complexes with known similar structures, proving the high efficiency and accuracy of Foldseek - Multimer. In addition, it also demonstrated its potential in practical applications, such as discovering new protein complex structural similarities in metagenomics research. In conclusion, Foldseek - Multimer provides a fast, sensitive and accurate solution for large - scale protein complex structure alignment, which is suitable for the research needs in the AlphaFold era.

Rapid and Sensitive Protein Complex Alignment with Foldseek-Multimer

Fast and accurate protein structure search with Foldseek

Speedier protein structure search

Sequence alignment using large protein structure alphabets improves sensitivity to remote homologs

Enhancing AlphaFold-Multimer-based Protein Complex Structure Prediction with MULTICOM in CASP15

Multiple Protein Structure Alignment at Scale with FoldMason

ColabFold: making protein folding accessible to all

MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming

FoldExplorer: Fast and Accurate Protein Structure Search with Sequence-Enhanced Graph Embedding

HelixFold-Multimer: Elevating Protein Complex Structure Prediction to New Heights

Revealing divergent evolution, identifying circular permutations and detecting active-sites by protein structure comparison

MRFalign: Protein Homology Detection through Alignment of Markov Random Fields

COMPASS: A Tool for Comparison of Multiple Protein Alignments with Assessment of Statistical Significance

Protalign: a 3-dimensional protein alignment assessment tool

Muscle-3D: scalable multiple protein structure alignment

Micalign: A Sequence-To-Structure Alignment Tool Integrating Multiple Sources of Information in Conditional Random Fields

Fast and accurate modeling and design of antibody-antigen complex using tFold

GPU-accelerated homology search with MMseqs2

Comprehensive Evaluation of AlphaFold-Multimer, AlphaFold3 and ColabFold, and Scoring Functions in Predicting Protein-Peptide Complex Structures

Fast Structural Alignment of Biomolecules Using a Hash Table, N-Grams and String Descriptors

Fold recognition by scoring protein maps using the congruence coefficient