ProteinMPNN Recovers Complex Sequence Properties of Transmembrane β-barrels

Marissa Dolorfino,Rituparna Samanta,Anastassia Vorobieva
DOI: https://doi.org/10.1101/2024.01.16.575764
2024-02-01
Abstract:Recent deep-learning (DL) protein design methods have been successfully applied to a range of protein design problems, including the design of novel folds, protein binders, and enzymes. However, DL methods have yet to meet the challenge of membrane protein (MP) and the design of complex β-sheet folds. We performed a comprehensive benchmark of one DL protein sequence design method, ProteinMPNN, using transmembrane and water-soluble β-barrel folds as a model, and compared the performance of ProteinMPNN to the new membrane-specific Rosetta Franklin2023 energy function. We tested the effect of input backbone refinement on ProteinMPNN performance and found that given refined and well-defined inputs, ProteinMPNN more accurately captures global sequence properties despite complex folding biophysics. It generates more diverse TMB sequences than Franklin2023 in pore-facing positions. In addition, ProteinMPNN generated TMB sequences that passed state-of-the-art in silico filters for experimental validation, suggesting that the model could be used in design tasks of diverse nanopores for single-molecule sensing and sequencing. Lastly, our results indicate that the low success rate of ProteinMPNN for the design of β-sheet proteins stems from backbone input accuracy rather than software limitations.
Bioinformatics
What problem does this paper attempt to address?
The paper primarily explores the performance of the deep learning method ProteinMPNN in designing Transmembrane Beta-barrels (TMBs) and compares it with the traditional Rosetta design method. ### Problems the Paper Attempts to Address: 1. **Evaluating ProteinMPNN's ability to design TMBs**: The study focuses on whether ProteinMPNN can accurately capture the complex sequence properties of TMBs, including how hydrophobic and hydrophilic amino acids are distributed to form the correct structure. 2. **Comparing the impact of different input backbone precisions**: By using different levels of backbone refinement as input (from coarse-grained to fully refined), the study evaluates the impact on ProteinMPNN's performance. 3. **Comparison with the Rosetta Franklin2023 energy function**: The study compares ProteinMPNN with the latest membrane-specific Rosetta energy function Franklin2023 in designing TMBs, particularly in terms of sequence diversity and folding quality. ### Key Findings: - ProteinMPNN, when given refined and accurate input backbones, can more accurately capture the overall sequence properties of TMBs, especially outperforming Franklin2023 in the distribution of hydrophobic and hydrophilic amino acids. - TMB sequences generated using ProteinMPNN exhibit higher diversity than those generated by Franklin2023, particularly showing more variation in amino acids at pore-facing positions. - The refinement level of the input backbone significantly affects the design quality; more refined backbones as input result in TMB sequences designed by ProteinMPNN that are more likely to meet experimental validation standards, i.e., capable of correctly folding in vitro. In summary, the paper aims to evaluate the potential and limitations of ProteinMPNN in designing TMBs through a series of benchmark tests and highlights that accurate input backbones are crucial for successfully designing such complex protein structures.