Lessons learnt from machine learning in early stages of drug discovery

Claudio N. Cavasotto,Juan I. Di Filippo,Valeria Scardino
DOI: https://doi.org/10.1080/17460441.2024.2354279
2024-05-12
Expert Opinion on Drug Discovery
Abstract:KEYWORDS: With the promise of a big leap, the field of Drug Discovery (DD) seems to have been permeated by Machine Learning (ML); it is not unreasonable to think that for every single 'classical' computational method within DD, there exists an ML-based counterpart; namely, for docking, Molecular Dynamics (MD), protein modeling, etc. Furthermore, the amount of money being invested for ML in DD is growing steadily. Evidently, ML methods have come to stay, and, in our opinion, they will be a valuable aid in accelerating the drug discovery pipeline.
pharmacology & pharmacy
What problem does this paper attempt to address?
### Problems Attempted to Solve by the Paper The paper "Lessons Learned from Early-Stage Drug Discovery Using Machine Learning" attempts to address the following major issues: 1. **Achieving Accurate Results Based on Error Causes**: - Some machine learning (ML)-based scoring functions perform well in virtual screening, but these methods used inappropriate datasets (e.g., DUD-E) during validation, leading to erroneous validation results. - For example, some deep learning (DL) methods predict molecular docking poses with low RMSD values, but these poses are physically unreasonable, such as incorrect stereochemistry or non-planar aromatic rings. - Another example is a study that claimed to find a kinase inhibitor using ML methods within 21 days, but it was later found that the inhibitor was very similar to known drugs in the training set. 2. **Availability of High-Quality Data and Development of ML Methods**: - In the field of drug discovery, collecting high-quality data is a complex and challenging task. Biological data is highly complex, and many factors may interact to affect the observed in vivo effects. - For example, in toxicity prediction, a large amount of data is needed to analyze various different toxicity endpoints, which is currently often difficult to obtain. - Although techniques such as multi-task learning or self-supervised learning can alleviate the problem of data scarcity, they can only provide partial solutions. 3. **The Rise of AlphaFold as a Protein Modeling Tool**: - AlphaFold (AF) is an AI-based method that can predict the 3D structure of proteins from amino acid sequences. Despite AF's great success, it does not truly understand the physical mechanisms of protein folding. - Extreme caution is needed when inferring biological behavior from structures predicted by AF. For example, some protein regions predicted by AF visually do not conform to any secondary structure, which may be due to insufficient representation in the training data. ### Summary By analyzing the lessons learned from the above three aspects, the paper aims to highlight the advantages, disadvantages, and risks of using machine learning methods in the early stages of drug discovery. The authors believe that while ML methods have great potential in drug discovery, they must be used cautiously and in conjunction with traditional methods to ensure their effectiveness and reliability. In particular, the collection of high-quality data and the application of interpretable AI methods will be the focus of future research.