Rapid and Sensitive Protein Complex Alignment with Foldseek-Multimer

Woosub Kim,Milot Mirdita,Eli Levy Karin,Cameron L.M. Gilchrist,Hugo Schweke,Johannes Soeding,Emmanuel D. Levy,Martin Steinegger
DOI: https://doi.org/10.1101/2024.04.14.589414
2024-10-28
Abstract:Advances in computational structure prediction will vastly augment the hundreds of thousands of currently-available protein complex structures. Translating these into discoveries requires aligning them, which is computationally prohibitive. Foldseek-Multimer computes complex alignments from compatible chain-to-chain alignments, identified by efficiently clustering their superposition vectors. Foldseek-Multimer is 3-4 orders of magnitudes faster than the gold standard, while producing comparable alignments; allowing it to compare billions of complex-pairs in 11 hours. Foldseek-Multimer is open-source software: https://github.com/steineggerlab/foldseek, webserver: https://search.foldseek.com and the BFMD database.
Bioinformatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to compare and contrast a large number of protein complex structures efficiently and sensitively. With the progress of computational structure prediction techniques, thousands or even millions of protein complex structures will be available in the future. In order to discover new knowledge from these structures, a fast and accurate method for large - scale structure alignment is required. ### Specific problems 1. **Computational complexity**: - Although traditional protein complex alignment methods (such as US - align) are accurate, their computational cost is extremely high, making it difficult to perform efficient searches in large - scale databases. 2. **Structural diversity**: - It is necessary to be able to quantify the structural diversity of protein complexes and identify structural similarities and changes between different conformations or homologues. 3. **Functional understanding**: - Many proteins function in the form of complexes, so understanding their structures is crucial for revealing the functions of proteins. 4. **Alignment under low sequence similarity**: - In the case of low sequence similarity, it is still necessary to be able to effectively discover structurally similar protein complexes. ### Solutions To solve the above problems, the researchers developed Foldseek - Multimer. This tool achieves efficient and sensitive protein complex alignment in the following ways: 1. **Fast chain - to - chain alignment**: - Use Foldseek for fast single - chain protein structure alignment, which greatly improves the alignment speed. 2. **Superposition vector representation**: - Represent chain - to - chain alignments as superposition vectors, and use efficient clustering algorithms (such as DBSCAN) to identify sets of compatible chain - to - chain alignments. 3. **Utilize clustering databases**: - Utilize clustering databases during the search process to reduce redundant calculations and further accelerate the search process. Through these innovations, Foldseek - Multimer can achieve a speed increase of 3 - 4 orders of magnitude while maintaining an accuracy comparable to existing gold standards (such as US - align). This enables it to handle the alignment tasks of billions of complex pairs in a short time. ### Experimental verification The paper conducted a benchmark test on 931 pairs of protein complexes with known similar structures, proving the high efficiency and accuracy of Foldseek - Multimer. In addition, it also demonstrated its potential in practical applications, such as discovering new protein complex structural similarities in metagenomics research. In conclusion, Foldseek - Multimer provides a fast, sensitive and accurate solution for large - scale protein complex structure alignment, which is suitable for the research needs in the AlphaFold era.