Bioptic -- A Target-Agnostic Potency-Based Small Molecules Search Engine

Vlad Vinogradov,Ivan Izmailov,Simon Steshin,Kong T. Nguyen

2024-07-01

Abstract:Recent successes in virtual screening have been made possible by large models and extensive chemical libraries. However, combining these elements is challenging: the larger the model, the more expensive it is to run, making ultra-large libraries unfeasible. To address this, we developed a target-agnostic, efficacy-based molecule search model, which allows us to find structurally dissimilar molecules with similar biological activities. We used the best practices to design fast retrieval system, based on processor-optimized SIMD instructions, enabling us to screen the ultra-large 40B Enamine REAL library with 100\% recall rate. We extensively benchmarked our model and several state-of-the-art models for both speed performance and retrieval quality of novel molecules.

Quantitative Methods,Artificial Intelligence,Information Retrieval

What problem does this paper attempt to address?

The paper proposes a solution to the problem of inefficient search in virtual screening of large-scale molecular libraries. Traditional drug discovery processes involve multiple stages, and virtual screening is a step that uses statistical algorithms to screen potential active molecules from a large number of molecules. With the advancement of chemical synthesis methods and automation technology, huge molecular libraries containing billions of molecules have emerged, but the high running cost of these large models makes screening of ultra-large-scale libraries impractical. To address this issue, the research team developed a target-agnostic, efficacy-based molecular search model that can find structurally different but biologically similar molecules. They designed a fast retrieval system based on best practices, utilizing processor-optimized SIMD instructions, achieving efficient screening with 100% recall rate on a billion-scale Enamine REAL molecular library. The paper also compares their model with other state-of-the-art models such as Deep Docking, DrugClip, and Chemprop in terms of speed performance and retrieval quality, highlighting the global, target-agnostic nature of their model, which can simultaneously search for activity-similar molecules for all possible targets without the need for retraining for each target. In addition, the paper discusses the impact of query selection strategy on model performance and demonstrates the speed performance in handling ultra-large molecular libraries. By using GPU for preprocessing and CPU for search, their system is able to search libraries containing billions of molecules in seconds. Overall, this paper aims to improve the efficiency of virtual screening by applying best practices of recommendation systems and search engines to accelerate the drug discovery process, especially in dealing with large-scale molecular libraries.

Bioptic -- A Target-Agnostic Potency-Based Small Molecules Search Engine

A Novel Search Engine for Virtual Screening of Very Large Databases

A Mechanism to Open Academic Chemistry to High-Throughput Virtual Screening

TargetHunter: an in Silico Target Identification Tool for Predicting Therapeutic Potential of Small Organic Molecules Based on Chemogenomic Database

Efficient Exploration of Chemical Space with Docking and Deep Learning

Thompson Sampling─An Efficient Method for Searching Ultralarge Synthesis on Demand Databases

Enhanced Sampling of Chemical Space for High Throughput Screening Applications using Machine Learning

Customizable Generation of Synthetically Accessible, Local Chemical Subspaces

Abstract Wrk2-04: Virtual screening of ultra-large chemical spaces for novel chemotype discovery

S-MolSearch: 3D Semi-supervised Contrastive Learning for Bioactive Molecule Search

An artificial intelligence accelerated virtual screening platform for drug discovery

The Pan-Canadian Chemical Library: A Mechanism to Open Academic Chemistry to High-Throughput Virtual Screening

Pareto Optimization to Accelerate Multi-Objective Virtual Screening

Virtual Screening on Natural Products for Discovering Active Compounds and Target Information

Pharmit: interactive exploration of chemical space

SPRINT Enables Interpretable and Ultra-Fast Virtual Screening against Thousands of Proteomes

An open-source drug discovery platform enables ultra-large virtual screens

Virtual Screening Expands the Non-Natural Amino Acid Palette for Peptide Optimization

Improved genome-scale multi-target virtual screening via a novel collaborative filtering approach to cold-start problem

The impact of Library Size and Scale of Testing on Virtual Screening

Do molecular fingerprints identify diverse active drugs in large-scale virtual screening? (no)