Evaluating molecular modeling tools for thermal stability using an independently generated dataset

Peishan Huang,Simon K. S. Chu,Henrique N. Frizzo,Morgan P. Connolly,Ryan W. Caster,Justin B. Siegel
DOI: https://doi.org/10.1101/856732
2019-11-28
Abstract:ABSTRACT Engineering proteins to enhance thermal stability is a widely utilized approach for creating industrially relevant biocatalysts. Computational tools that guide these engineering efforts remain an active area of research with new data sets and develop algorithms. To aid in these efforts, we are reporting an expansion of our previously published data set of mutants for a β-glucosidase to include both measures of T M and ΔΔG, to complement the previously reported measures of T 50 and kinetic constants ( k cat and K M ). For a set of 51 mutants, we found that T 50 and T M are moderately correlated with a Pearson correlation coefficient (PCC) of 0.58, indicated the two methods capture different physical features. The performance of predicted stability using five computational tools are also evaluated on the 51 mutants dataset, none of which are found to be strong predictors of the observed changes in T 50 , T M , or ΔΔG. Furthermore, the ability of the five algorithms to predict the production of isolatable soluble protein is examined, which revealed that Rosetta ΔΔG, ELASPIC, and DeepDDG are capable of predicting if a mutant could be produced and isolated as a soluble protein. These results further highlight the need for new algorithms for predicting modest, yet important, changes in thermal stability as well as a new utility for current algorithms for prescreening designs for the production of soluble mutants.
What problem does this paper attempt to address?