CombFold: predicting structures of large protein assemblies using a combinatorial assembly algorithm and AlphaFold2

Ben Shor,Dina Schneidman-Duhovny
DOI: https://doi.org/10.1038/s41592-024-02174-0
IF: 48
2024-02-08
Nature Methods
Abstract:Deep learning models, such as AlphaFold2 and RosettaFold, enable high-accuracy protein structure prediction. However, large protein complexes are still challenging to predict due to their size and the complexity of interactions between multiple subunits. Here we present CombFold, a combinatorial and hierarchical assembly algorithm for predicting structures of large protein complexes utilizing pairwise interactions between subunits predicted by AlphaFold2. CombFold accurately predicted (TM-score >0.7) 72% of the complexes among the top-10 predictions in two datasets of 60 large, asymmetric assemblies. Moreover, the structural coverage of predicted complexes was 20% higher compared to corresponding Protein Data Bank entries. We applied the method on complexes from Complex Portal with known stoichiometry but without known structure and obtained high-confidence predictions. CombFold supports the integration of distance restraints based on crosslinking mass spectrometry and fast enumeration of possible complex stoichiometries. CombFold's high accuracy makes it a promising tool for expanding structural coverage beyond monomeric proteins.
biochemical research methods
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to predict the structures of large - scale protein complexes efficiently and accurately. Although existing deep - learning models such as AlphaFold2 and RosettaFold can predict the structures of single - chain proteins with high precision, for large - scale protein complexes, due to their large sizes and the complex interactions among multiple subunits, prediction remains challenging. To solve this problem, the authors proposed the CombFold method, which is a method based on the combinatorial assembly algorithm and the pairwise interactions between subunits predicted by AlphaFold2, aiming to improve the accuracy of large - scale protein complex structure prediction. Specifically, CombFold achieves this goal through the following steps: 1. **Generate pairwise subunit interactions**: Use AlphaFold2 to predict the pairwise interactions between all possible subunit pairings. 2. **Create a unified representation**: Select the representative structures of each subunit from the predicted structures and calculate the transformation relationships between these representative structures. 3. **Combinatorially assemble subunits**: Utilize the calculated transformation relationships to construct the structure of the entire complex through combinatorial and hierarchical assembly. The test results of CombFold on multiple benchmark datasets show that this method can significantly improve the accuracy and coverage of large - scale protein complex structure prediction, especially when dealing with heteromeric complexes. In addition, CombFold also supports the integration of experimental data based on cross - linking mass spectrometry, further increasing the success rate of prediction.