Abstract:Matched Molecular Pairs (MMP) analysis is a well-established technique for Structure Activity and Property Analysis (SAR and SPR). Summarizing multiple MMPs that describe the same structural change into a single chemical transform can be a powerful tool for prediction (termed Transform from here on). This is particularly useful in the area of Absorption, Distribution, Metabolism, and Elimination (ADME) analysis that is less influenced by 3D structural binding effects. The creation of a knowledge database containing many of these Transforms across typical ADME assays promises to be a powerful approach to aid multidimensional optimization. We present a detailed workflow for the derivation of such a database. We include details of an MMP fragmentation algorithm with associated statistical summarization methods for the derivation of Transforms. This is made freely available as part of the LillyMol software package. We describe the application of this method to several ADME/Tox (Toxicity) assay data sets and highlight multiple cases where the impact of traditional medicinal chemistry Transforms is contradicted by MMP data. We also describe the internal software interface used by medicinal chemists to aid the design of new compounds via automated suggestion. This approach utilizes the matched pairs database to "suggest" improved compounds in an automated design scenario. A nonvisual script-based version of the automated suggestions code with an associated set of described chemical Transforms is also made freely available along with this paper and as part of the LillyMol software package. Finally, we contrast this knowledge database against a larger database of all MMPs derived from a 2 million compound diversity set and a subset of MMPs seen in historical discovery projects. The comparison against all transforms in the diversity collection highlights the very low coverage of the transform database as compared to all possible transforms involving 15 atom fragments. The comparison against a smaller subset of Transforms seen on internal Medicinal Chemistry projects shows better coverage of the transform database for a small set of common medicinal chemistry strategies. Within the context of all possible transforms available to a medicinal chemistry project team, the challenge remains to move beyond mere idea generation from past projects toward high quality prediction for novel ADME/Tox modulating Transforms.The Supporting Information is available free of charge at <a class="ext-link" href="/doi/10.1021/acs.jcim.0c00583?goto=supporting-info">https://pubs.acs.org/doi/10.1021/acs.jcim.0c00583</a>.Top 500 med chem transforms in SMILES format (<a class="ext-link" href="/doi/suppl/10.1021/acs.jcim.0c00583/suppl_file/ci0c00583_si_001.xlsx">XLSX</a>)Additional data tables used in the discussion and derivation of DIFFN metric (<a class="ext-link" href="/doi/suppl/10.1021/acs.jcim.0c00583/suppl_file/ci0c00583_si_002.pdf">PDF</a>)Data table used for preparing <a class="internalNav" href="#fig3">Figure </a><a class="internalNav" href="#fig3">3</a> (<a class="ext-link" href="/doi/suppl/10.1021/acs.jcim.0c00583/suppl_file/ci0c00583_si_003.xlsx">XLSX</a>)This article has not yet been cited by other publications.

Computationally Efficient Algorithm to Identify Matched Molecular Pairs (MMPs) in Large Data Sets

Advances in Computational Medicinal Chemistry: Matched Molecular Pair Analysis

Matched Molecular Pair Analysis in Drug Discovery: Methods and Recent Applications

Prospective Prediction of Antitarget Activity by Matched Molecular Pairs Analysis

Hierarchical Analysis of Bioactive Matched Molecular Pairs, Encoded Chemical Transformations, and Associated Substructures

The Derivation of a Matched Molecular Pairs Based ADME/Tox Knowledge Base for Compound Optimization

Coupling Matched Molecular Pairs with Machine Learning for Virtual Compound Optimization

Semi-automated Workflow for Molecular Pair Analysis and QSAR-assisted Transformation Space Expansion

Prediction-driven matched molecular pairs to interpret QSARs and aid the molecular optimization process

MapLE: Matching Molecular Analogues Promptly with Low Computational Resources by Multi-Metrics Evaluation (student Abstract)

MMP-Cliffs: Systematic Identification of Activity Cliffs on the Basis of Matched Molecular Pairs

Drug discovery using very large numbers of patents: general strategy with extensive use of match and edit operations

ALMERIA: Boosting pairwise molecular contrasts with scalable methods

Computer-aided pattern scoring – A multitarget dataset-driven workflow to predict ligands of orphan targets

Identification of structural features in chemicals associated with cancer drug response: a systematic data-driven analysis

QSAR-assisted-MMPA to expand chemical transformation space for lead optimization

MolData, a molecular benchmark for disease and target based machine learning

VAMMPIRE: a matched molecular pairs database for structure-based drug design and optimization.

A Diverse Benchmark Based on 3D Matched Molecular Pairs for Validating Scoring Functions

ISiCLE: A molecular collision cross section calculation pipeline for establishing large in silico reference libraries for compound identification

Large-Scale Off-Target Identification Using Fast and Accurate Dual Regularized One-Class Collaborative Filtering and Its Application to Drug Repurposing.