Multiple Protein Structure Alignment at Scale with FoldMason

Cameron Laurence Mathison Gilchrist,Milot Mirdita,Martin Steinegger
DOI: https://doi.org/10.1101/2024.08.01.606130
2024-08-27
Abstract:Protein structure is conserved beyond sequence, making multiple structural alignment (MSTA) essential for analyzing distantly related proteins. Computational prediction methods have vastly extended our repository of available proteins structures, requiring fast and accurate MSTA methods. Here, we introduce FoldMason, a progressive MSTA method that leverages the structural alphabet from Foldseek, a pairwise structural aligner, for multiple alignment of hundreds of thousands of protein structures. FoldMason computes confidence scores, offers interactive visualizations, and provides essential speed and accuracy for large-scale protein structure analysis in the era of accurate structure prediction. Using Flaviviridae glycoproteins, we demonstrate how FoldMason's MSTAs support phylogenetic analysis below the twilight zone. FoldMason is free open-source software: foldmason.foldseek.com and webserver: search.foldseek.com/foldmason.
Bioinformatics
What problem does this paper attempt to address?
The paper attempts to address the problem of how to quickly and accurately perform multiple structure alignments (MSTA) in large-scale protein structure analysis. With the development of computational prediction methods, we are able to obtain a large amount of protein structure data, which requires existing MSTA methods to not only maintain high accuracy but also have the capability to handle large-scale datasets. Current methods, although performing well in terms of accuracy, have significant shortcomings in processing speed and cannot efficiently handle datasets containing millions of structures. To tackle this challenge, the authors introduce FoldMason, a novel progressive MSTA method based on Foldseek. FoldMason uses the structural alphabet in Foldseek to represent protein structures and employs efficient algorithms to achieve rapid alignment of tens of thousands of protein structures. Compared to other MSTA methods, FoldMason not only meets or exceeds the standards of existing methods in alignment quality but is also two orders of magnitude faster. Additionally, FoldMason provides features such as confidence scoring and interactive visualization, making it an important tool for large-scale protein structure analysis.