ProteusAI: An Open-Source and User-Friendly Platform for Machine Learning-Guided Protein Design and Engineering

Jonathan Funk,Laura Machado,Samuel A. Bradley,Marta Napiorkowska,Rodrigo Gallegos-Dextre,Liubov Pashkova,Niklas G. Madsen,Henry Webel,Patrick Victor Phaneuf,Timothy P. Jenkins,Carlos G. Acevedo-Rocha Sr.
DOI: https://doi.org/10.1101/2024.10.01.616114
2024-10-03
Abstract:Protein design and engineering are crucial for advancements in biotechnology, medicine, and sustainability. Machine learning (ML) models are used to design or enhance protein properties such as stability, catalytic activity, and selectivity. However, many existing ML tools require specialized expertise or lack open-source availability, limiting broader use and further development. To address this, we developed ProteusAI, a user-friendly and open-source ML platform to streamline protein engineering and design tasks. ProteusAI offers modules to support researchers in various stages of the design-build-test-learn (DBTL) cycle, including protein discovery, structure-based design, zero-shot predictions, and ML-guided directed evolution (MLDE). Our benchmarking results demonstrate ProteusAI's efficiency in improving proteins and enyzmes within a few DBTL-cycle iterations. ProteusAI democratizes access to ML-guided protein engineering and is freely available for academic and commercial use. Future work aims to expand and integrate novel methods in computational protein and enzyme design to further develop ProteusAI.
Bioinformatics
What problem does this paper attempt to address?
The paper attempts to address challenges primarily in the field of protein design and engineering. Specifically, the authors focus on the following aspects: 1. **High cost and low success rate of protein design and optimization**: Traditional Directed Evolution (DE) methods, although effective, require extensive experimental screening, which is costly and has a low success rate. Due to the complexity and vastness of the sequence space, it is difficult to predict which mutations will enhance protein properties. 2. **Limitations of existing machine learning tools**: Many existing machine learning (ML) tools either require specialized computational skills and expertise or lack open-source availability, limiting their widespread application and development. 3. **Complexity of multi-attribute optimization**: Mutations often affect multiple attributes, such as stability, selectivity, and activity, which increases the complexity of optimization. To address these issues, the authors developed **ProteusAI**, a user-friendly and open-source machine learning platform designed to simplify protein design and engineering tasks. ProteusAI provides multiple modules that support researchers at various stages of the Design-Build-Test-Learn (DBTL) cycle, including protein discovery, structure-based design, zero-shot prediction, and Machine Learning-Guided Directed Evolution (MLDE). Through these modules, ProteusAI aims to improve the performance of proteins and enzymes, reduce experimental costs, and promote community-driven development and customization.