Engineering of highly active and diverse nuclease enzymes by combining machine learning and ultra-high-throughput screening
Neil Thomas,David Belanger,Chenling Xu,Hanson Lee,Kat Hirano,Kosuke Iwai,Vanja Polic,Kendra D Nyberg,Kevin G Hoff,Lucas Frenz,Charlie A Emrich,Jun W Kim,Mariya Chavarha,Abi Ramanan,Jeremy J Agresti,Lucy J Colwell
DOI: https://doi.org/10.1101/2024.03.21.585615
2024-06-05
Abstract:Optimizing enzymes to function in novel chemical environments is a central goal of synthetic biology, but optimization is often hindered by a rugged, expansive protein search space and costly experiments. In this work, we present TeleProt, an ML framework that blends evolutionary and experimental data to design diverse protein variant libraries, and employ it to improve the catalytic activity of a nuclease enzyme that degrades biofilms that accumulate on chronic wounds. After multiple rounds of high-throughput experiments using both TeleProt and standard directed evolution (DE) approaches in parallel, we find that our approach found a significantly better top-performing enzyme variant than DE, had a better hit rate at finding diverse, high-activity variants, and was even able to design a high-performance initial library using no prior experimental data. We have released a dataset of 55K nuclease variants, one of the most extensive genotype-phenotype enzyme activity landscapes to date, to drive further progress in ML-guided design.
Bioinformatics