Fine-tuning of conditional Transformers for the generation of functionally characterized enzymes

Marco Nicolini,Emanuele Saitto,Ruben Emilio Jimenez Franco,Emanuele Cavalleri,Marco Mesiti,Aldo Javier Galeano Alfonso,Dario Malchiodi,Alberto Paccanaro,Peter N. Robinson,Elena Casiraghi,Giorgio Valentini
DOI: https://doi.org/10.1101/2024.08.10.607430
2024-08-10
Abstract:We introduce , a Protein Language Model (PLM) that employs a multifaceted learning strategy based on transfer learning from a decoder-based Transformer, conditional learning using specific functional keywords, and fine-tuning to model specific Enzyme Commission (EC) categories. Using , we investigate the conditions under which fine-tuning enhances the prediction and generation of EC categories, showing a two-fold perplexity improvement in EC-specific categories compared to a generalist model. Our extensive experimentation shows that generated sequences can be very different from natural ones while retaining similar tertiary structures, functions and chemical kinetics of their natural counterparts. Importantly, the embedded representations of the generated enzymes closely resemble those of natural ones, thus making them suitable for downstream tasks. Finally, we illustrate how can be used in practice to generate enzymes characterized by specific functions using in-silico directed evolution, a computationally inexpensive PLM fine-tuning procedure significantly enhancing and assisting targeted enzyme engineering tasks.
Bioinformatics
What problem does this paper attempt to address?