A multimodal Transformer Network for protein-small molecule interactions enhances predictions of kinase inhibition and enzyme-substrate relationships

Alexander Kroll,Sahasra Ranjan,Martin J. Lercher
DOI: https://doi.org/10.1371/journal.pcbi.1012100
2024-05-22
PLoS Computational Biology
Abstract:The activities of most enzymes and drugs depend on interactions between proteins and small molecules. Accurate prediction of these interactions could greatly accelerate pharmaceutical and biotechnological research. Current machine learning models designed for this task have a limited ability to generalize beyond the proteins used for training. This limitation is likely due to a lack of information exchange between the protein and the small molecule during the generation of the required numerical representations. Here, we introduce ProSmith, a machine learning framework that employs a multimodal Transformer Network to simultaneously process protein amino acid sequences and small molecule strings in the same input. This approach facilitates the exchange of all relevant information between the two molecule types during the computation of their numerical representations, allowing the model to account for their structural and functional interactions. Our final model combines gradient boosting predictions based on the resulting multimodal Transformer Network with independent predictions based on separate deep learning representations of the proteins and small molecules. The resulting predictions outperform recently published state-of-the-art models for predicting protein-small molecule interactions across three diverse tasks: predicting kinase inhibitions; inferring potential substrates for enzymes; and predicting Michaelis constants K M . The Python code provided can be used to easily implement and improve machine learning predictions involving arbitrary protein-small molecule interactions. Understanding how proteins interact with small molecules, such as drugs, is critical to advancing medical, biological, and biotechnological research. Our work introduces ProSmith, a machine learning framework that improves the prediction of protein-small molecule interactions. Protein-small molecule interactions can be predicted by using numerical representations of proteins and small molecules as input to machine learning prediction models. Previous methods typically generated separate numerical representations for the proteins and small molecules without considering their interactions. ProSmith, however, combines both protein sequence and small molecule structural information in the input of a single multimodal Transformer Network to generate a joint numerical representation. Unlike previous methods, this allows for a comprehensive exchange of information between protein and small molecule, capturing the complex relationships and interactions between these two types of molecules. ProSmith successfully predicts several biological interactions, including kinase inhibitions, potential enzyme-substrate pairs, and enzyme kinetic parameters K M . We provide Python code that can be easily adapted to improve predictions for any protein-small molecule interaction.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?