Interpretable and explainable predictive machine learning models for data-driven protein engineering

David Medina-Ortiz,Ashkan Khalifeh,Hoda Anvari-Kazemabad,Mehdi D. Davari
DOI: https://doi.org/10.1101/2024.02.18.580860
2024-03-03
Abstract:Protein engineering using directed evolution and (semi)rational design has emerged as a powerful strategy for optimizing and enhancing enzymes or proteins with desired properties. Integrating artificial intelligence methods has further enhanced and accelerated protein engineering through predictive models developed in data-driven strategies. However, the lack of explainability and interpretability in these models poses challenges. Explainable Artificial Intelligence addresses the interpretability and explainability of machine learning models, providing transparency and insights into predictive processes. Nonetheless, there is a growing need to incorporate explainable techniques in predicting protein properties in machine learning-assisted protein engineering. This work explores incorporating explainable artificial intelligence in predicting protein properties, emphasizing its role in trustworthiness and interpretability. It assesses different machine learning approaches, introduces diverse explainable methodologies, and proposes strategies for seamless integration, improving trust-worthiness. Practical cases demonstrate the explainable model’s effectiveness in identifying DNA binding proteins and optimizing Green Fluorescent Protein brightness. The study highlights the utility of explainable artificial intelligence in advancing computationally assisted protein design, fostering confidence in model reliability.
Bioinformatics
What problem does this paper attempt to address?
This paper discusses the problem of using interpretable and understandable predictive machine learning models in data-driven protein engineering. Currently, although artificial intelligence methods have enhanced the speed and effectiveness of protein engineering, the lack of interpretability and understandability of machine learning models remains a challenge. The paper emphasizes the importance of Explainable Artificial Intelligence (XAI) in predicting protein properties to enhance the credibility and comprehensibility of the models. The research evaluates different machine learning methods and introduces various interpretability techniques, proposing strategies to seamlessly integrate these techniques into protein engineering prediction models. Through practical examples such as identifying DNA-binding proteins and optimizing green fluorescent protein brightness, the paper demonstrates the effectiveness of interpretable models and highlights the role of XAI in advancing computational-assisted protein design and enhancing confidence in model reliability.