Using machine learning to predict the effects and consequences of mutations in proteins

Daniel J Diaz,Anastasiya V Kulikova,Andrew D Ellington,Claus O Wilke
DOI: https://doi.org/10.1016/j.sbi.2022.102518
Abstract:Machine and deep learning approaches can leverage the increasingly available massive datasets of protein sequences, structures, and mutational effects to predict variants with improved fitness. Many different approaches are being developed, but systematic benchmarking studies indicate that even though the specifics of the machine learning algorithms matter, the more important constraint comes from the data availability and quality utilized during training. In cases where little experimental data are available, unsupervised and self-supervised pre-training with generic protein datasets can still perform well after subsequent refinement via hybrid or transfer learning approaches. Overall, recent progress in this field has been staggering, and machine learning approaches will likely play a major role in future breakthroughs in protein biochemistry and engineering.
What problem does this paper attempt to address?