Results of the Protein Engineering Tournament: An Open Science Benchmark for Protein Modeling and Design

Chase Armer,Hassan Kane,Dana L Cortade,Henning Redestig,David A Estell,Adil Yusuf,Nathan J Rollins,Hansen Spinner,Debora Marks,TJ Brunette,Peter J Kelly,Erika DeBenedictis
DOI: https://doi.org/10.1101/2024.08.12.606135
2024-08-12
Abstract:The grand challenge of protein engineering is the development of computational models to characterize and generate protein sequences for arbitrary functions. Progress is limited by lack of 1) benchmarking opportunities, 2) large protein function datasets, and 3) access to experimental protein characterization. We introduce the Protein Engineering Tournament, a fully-remote competition designed to foster the development and evaluation of computational approaches in protein engineering. The tournament consists of an in silico round, predicting biophysical properties from protein sequences, followed by an in vitro round where novel protein sequences are designed, expressed and characterized using automated methods. Upon completion, all datasets, experimental protocols, and methods are made publicly available. We detail the structure and outcomes of a pilot Tournament involving seven protein design teams, powered by six multi-objective datasets, with experimental characterization by our partner, International Flavors and Fragrances. Forthcoming Protein Engineering Tournaments aim to mobilize the scientific community towards transparent evaluation of progress in the field.
Bioengineering
What problem does this paper attempt to address?
The paper primarily introduces a new scientific competition called the "Protein Engineering Tournament" (PET), which aims to advance computational methods and experimental validation in the field of protein engineering by providing an open benchmark platform. This tournament attempts to address the following three main issues: 1. **Lack of benchmarking opportunities**: In the field of protein engineering, researchers need appropriate benchmark datasets to evaluate the effectiveness of newly developed computational models. Currently available datasets are often small in scale and low in complexity, limiting the development of predictive models. 2. **Absence of large protein function datasets**: Current datasets typically cover only simple sequence-function relationships, such as the effects of single-point mutations, which limits the models' ability to accurately characterize a wide range of protein functions. 3. **Limited experimental validation capability**: After computational scientists design new protein sequences, it is often difficult to conduct systematic experimental validation, hindering the development of generative design methods and the establishment of standardized evaluation criteria. To address these issues, the PET tournament includes two main stages: the first stage is an in silico round, where participants predict the biophysical properties of given protein sequences using computational methods; the second stage is an in vitro round, where participants are required to design new protein sequences with specific biophysical properties and validate them through automated experimental methods. After the tournament, all datasets, experimental protocols, and methods will be publicly released to promote transparent evaluation and technological advancement within the community. In summary, the paper attempts to overcome challenges in the field of protein engineering by establishing such a tournament mechanism, thereby accelerating scientific development in this area.