Innovative Virtual Screening of PD-L1 Inhibitors: The Synergy of Molecular Similarity, Neural Networks, and GNINA Docking

Tieu-Long Phan,Van-Thinh To,Bao-Vy Ngoc Doan,Phuoc-Chung Van Nguyen,Dong-Nghi Hoang Nguyen,Quang-Huy Nguyen Le,Hoang-Huy Nguyen,The-Chuong Trinh,Tuyen Ngoc Truong
DOI: https://doi.org/10.26434/chemrxiv-2024-zf1k8
2024-01-09
Abstract:Immune checkpoint inhibitors have garnered significant attention in oncological research over recent years. A plethora of studies have elucidated that inhibitors targeting the Programmed Death-Ligand 1 (PD-L1) play a pivotal role in circumventing the evasion mechanisms of cancer cells against the immune system. This study aimed to develop an integrated screening model combining an Artificial Neural Network (ANN), Molecular Similarity (MS) assessments, and GNINA 1.0 molecular docking, targeting PD-L1 inhibitors. A database of 2044 substances with known PD-L1 inhibitory activity was compiled from Google Patents and used to enhance molecular similarity evaluations and train the machine learning model. For retrospective validation of the docking procedure, the human PD-L1 protein, with the Protein Data Bank (PDB) ID: 5N2F, was employed as a control. In this phase of the study, 15,235 compounds from the DrugBank database were subjected to a series of screening processes: initially through medicinal chemistry filters, followed by MS assessments, the ANN model, and culminating with molecular docking using GNINA 1.0. The decoy generation yielded promising outcomes, evidenced by an AUC-ROC 1NN value of 0.52 and Doppelganger scores with a mean of 0.24 and a maximum of 0.346, indicating a high resemblance of the decoys to the active set. For MS, the AVALON emerged as the most effective fingerprint for similarity searching, demonstrating an Enrichment Factor (EF) of 1% at 10.96%, an AUC-ROC of 0.963, and an optimal similarity threshold of 0.32. The ANN model demonstrated superior performance in cross-validation, achieving an average precision of 0.863±0.032 and an F1 score of 0.745±0.039, outperforming both the Support Vector Classifier (SVC) and Random Forest (RF) models, albeit not significantly. In external validation, the ANN model maintained its superiority with an average precision of 0.851 and an F1 score of 0.790. GNINA 1.0, employed for molecular docking, was validated through redocking and retrospective control, achieving an AUC of 0.975, with a critical cnn_pose_score threshold of 0.73. From the initial 15,235 compounds, 128 were shortlisted using the MS and ANN models. Further screening through GNINA 1.0 identified 22 potential candidates, among which (3S)-1-(4-acetylphenyl)-5-oxopyrrolidine-3-carboxylic acid emerged as the most promising, with a cnn_pose_score of 0.79, a PD-L1 inhibitory probability of 70.5%, and a Tanimoto coefficient of 0.35.
Chemistry
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to identify effective PD - L1 inhibitors through virtual screening methods. Specifically, the researchers developed an integrated screening model, which combines molecular similarity assessment, artificial neural network (ANN) and GNINA 1.0 molecular docking technology, aiming to efficiently screen out candidate drugs with potential PD - L1 inhibitory activity from a large number of compounds. The goal of this research is to improve the drug discovery efficiency in cancer immunotherapy, especially for PD - L1, this important immune checkpoint protein. The key steps in the paper include: 1. **Dataset construction**: 2,044 substances known to have PD - L1 inhibitory activity were collected from Google Patents to enhance molecular similarity assessment and train the machine - learning model. 2. **Molecular similarity model**: Multiple molecular fingerprints (such as AVALON, MACCS, ECFP4, RDK5 and MAP4) were used for molecular similarity assessment to determine the most effective fingerprint type and its optimal similarity threshold. 3. **Artificial neural network model**: An ANN model was developed using SECFP fingerprints as input data, and its performance was evaluated through cross - validation and external validation. 4. **Molecular docking**: GNINA 1.0 was used for molecular docking. The performance of the model was verified through redocking and retrospective control, and finally 22 potential candidate compounds were screened out. Through these steps, the researchers hope to find the possibility of re - using new or known drugs, in order to achieve a breakthrough in the development of PD - L1 inhibitors.