Pre-trained molecular representations enable antimicrobial discovery

Roberto Olayo-Alarcon,Martin K. Amstalden,Annamaria Zannoni,Medina Bajramovic,Cynthia M. Sharma,Ana Rita Brochado,Mina Rezaei,Christian L Müller
DOI: https://doi.org/10.1101/2024.03.11.584456
2024-06-24
Abstract:The rise in antimicrobial resistance poses a worldwide threat, reducing the efficacy of common antibiotics. Determining the antimicrobial activity of new chemical compounds through experimental methods is still a time-consuming and costly endeavor. Compound-centric deep learning models hold the promise to speed up this search and prioritization process. Here, we introduce a lightweight computational strategy for antimicrobial discovery that builds on MolE (Molecular representation through redundancy reduced Embedding), a deep learning framework that leverages unlabeled chemical structures to learn task- independent molecular representations. By combining MolE representation learning with experimentally validated compound-bacteria activity data, we design a general predictive model that enables assessing compounds with respect to their antimicrobial potential. The model correctly identified recent growth- inhibitory compounds that are structurally distinct from current antibiotics and discovered de novo three human-targeted drugs as growth inhibitors which we experimentally confirmed. Our framework offers a viable cost-effective strategy to accelerate antibiotics discovery.
Microbiology
What problem does this paper attempt to address?
The paper attempts to address the global threat posed by the increasing severity of antibiotic resistance. Traditionally, determining the antibacterial activity of new chemical compounds through experimental methods is both time-consuming and costly. This paper introduces a lightweight computational strategy for discovering new antibacterial drugs. The strategy is based on the MolE (Molecular Representation through Redundancy Reduction Embedding) deep learning framework, which leverages unlabeled chemical structures to learn task-agnostic molecular representations. By combining MolE's representation learning with experimentally validated compound-bacteria activity data, a general predictive model was designed to assess the antibacterial potential of compounds. Specifically, the main contributions of the paper include: 1. **Development of the MolE framework**: This is a self-supervised pre-training strategy capable of learning useful molecular representations from a large number of unlabeled chemical structures, which can be transferred to downstream prediction tasks. 2. **Construction of the predictive model**: By combining MolE's molecular representations with publicly available growth inhibition data, a machine learning model was trained to predict the inhibitory effects of compounds on various bacteria. 3. **Validation of the model's effectiveness**: Experiments confirmed that some compounds predicted by the model indeed possess antibacterial activity, including the inhibition of Staphylococcus aureus growth by three novel human-targeted drugs. 4. **Application to large-scale chemical library screening**: In an independent chemical library containing 2,327 compounds, the MolE-XGBoost model was used to predict potential antibacterial compounds, and the antibacterial activity of some high-scoring compounds was experimentally validated. Overall, the paper proposes an efficient and cost-effective method to accelerate the discovery process of new antibiotics.