Abstract:Quantification of enzymatic activities still heavily relies on experimental assays, which can be expensive and time-consuming. Therefore, methods that enable accurate predictions of enzyme activity can serve as effective digital twins. A few recent studies have shown the possibility of training machine learning (ML) models for predicting the enzyme turnover numbers ( ) and Michaelis constants ( ) using only features derived from enzyme sequences and substrate chemical topologies by training on measurements. However, several challenges remain such as lack of standardized training datasets, evaluation of predictive performance on out-of-distribution examples, and model uncertainty quantification. Here, we introduce CatPred, a comprehensive framework for ML prediction of enzyme kinetics. We explored different learning architectures and feature representations for enzymes including those utilizing pretrained protein language model features and pretrained three-dimensional structural features. We systematically evaluate the performance of trained models for predicting , , and inhibition constants ( ) of enzymatic reactions on held-out test sets with a special emphasis on out-of-distribution test samples (corresponding to enzyme sequences dissimilar from those encountered during training). CatPred assumes a probabilistic regression approach offering query-specific standard deviation and mean value predictions. Results on unseen data confirm that accuracy in enzyme parameter predictions made by CatPred positively correlate with lower predicted variances. Incorporating pre-trained language model features is found to be enabling for achieving robust performance on out-of-distribution samples. Test evaluations on both held-out and out-of-distribution test datasets confirm that CatPred performs at least competitively with existing methods while simultaneously offering robust uncertainty quantification. CatPred offers wider scope and larger data coverage (∼23k, 41k, 12k data-points respectively for ). A web-resource to use the trained models is made available at:

DLKcat cannot predict meaningful kcat values for mutants and unfamiliar enzymes

DLKcat cannot predict meaningful k cat values for mutants and unfamiliar enzymes

Deep Learning Basedkcatprediction Enables Improved Enzyme Constrained Model Reconstruction

DLTKcat: deep learning-based prediction of temperature-dependent enzyme turnover rates

DeepEnzyme: a robust deep learning model for improved enzyme turnover number prediction by utilizing features of protein 3D-structures

CatPred: A comprehensive framework for deep learning in vitro enzyme kinetic parameters , and

Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning

Modeling of Transition State by Molecular Dynamics. Prediction of Catalytic Efficiency of the Mutants of Mandelate Racemase

A Regressor-Guided Graph Diffusion Model for Predicting Enzyme Mutations to Enhance Turnover Number

Evolutionary-Scale Enzymology Enables Biochemical Constant Prediction Across a Multi-Peaked Catalytic Landscape

A Computational Method to Predict Effects of Residue Mutations on the Catalytic Efficiency of Hydrolases

Deep learning allows genome-scale prediction of Michaelis constants from structural features

Kinetic profiling of metabolic specialists demonstrates stability and consistency of in vivo enzyme turnover numbers

MPEK: a multitask deep learning framework based on pretrained language models for enzymatic reaction kinetic parameters prediction

Changes in relative fitness with temperature among second chromosome arrangements in Drosophila melanogaster.

GraphKM: machine and deep learning for KM prediction of wildtype and mutant enzymes

Limitations of Current Machine-Learning Models in Predicting Enzymatic Functions for Uncharacterized Proteins

Machine Learning Identifies Chemical Characteristics That Promote Enzyme Catalysis

Supplementary Dataset for Deep Learning Based Kcat Prediction Enables Improved Enzyme Constrained Model Reconstruction

Enhancement of incorporation of 131Iododeoxyuridine into tumors after application of Clostridium oncolyticum s. butyricum (M 55)

Enhancing Machine-Learning Prediction of Enzyme Catalytic Temperature Optima through Amino Acid Conservation Analysis