Generation of a high confidence set of domain-domain interface types to guide protein complex structure predictions by AlphaFold

Johanna Lena Geist,Chop Yan Lee,Joelle Morgan Strom,José de Jesús Naveja,Katja Luck
DOI: https://doi.org/10.1093/bioinformatics/btae482
IF: 5.8
2024-08-22
Bioinformatics
Abstract:Motivation: While the release of AlphaFold (AF) represented a breakthrough for the prediction of protein complex structures, its sensitivity, especially when using full length protein sequences, still remains limited. Modeling success rates might increase if AF predictions were guided by likely interacting protein fragments. This approach requires available sets of highly confident protein-protein interface types. Computational resources, such as 3did, infer interacting globular domain types from observed contacts in protein structures. Assessing the accuracy of these predicted interface types is difficult because we lack hand-curated reference sets of verified domain-domain interface (DDI) types. Results: To improve protein complex modeling of DDIs by AF, we manually inspected 80 randomly selected DDI types from the 3did resource to generate a first reference set of DDI types. Identified cases of DDI type non-approval (40%) primarily resulted from inaccurate Pfam domain matches, crystal contacts, and synthetic protein constructs. Using logistic regression, we predicted a subset of 2411 out of 5724 considered DDI types in 3did to be of high confidence, which we subsequently applied to 53000 human protein interactions to predict DDIs followed by AF modeling. We obtained highly confident AF models for 604 out of 1129 predicted DDIs. Of note, for 47% of them no confident AF structural model could be obtained using full length protein sequences. Supplementary information: Supplementary data are available at Bioinformatics online.
What problem does this paper attempt to address?