Modeling Tanimoto Similarity Value Distributions and Predicting Search Results

Martin Vogt,Jürgen Bajorath
DOI: https://doi.org/10.1002/minf.201600131
IF: 4.05
2016-12-29
Molecular Informatics
Abstract:Similarity searching using molecular fingerprints has a long history in chemoinformatics and continues to be a popular approach for virtual screening. Typically, known active reference molecules are used to search databases for new active compounds. However, this search has black box character because similarity value distributions are dependent on fingerprints and compound classes. Consequently, no generally applicable similarity threshold values are available as reliable indicators of activity relationships between reference and database compounds. Therefore, it is generally uncertain where new active compounds might appear in database rankings, if at all. In this contribution, methods are discussed for modeling similarity value distributions of fingerprint search calculations using Tanimoto coefficients and estimating rank positions of active compounds. To our knowledge, these are the first approaches for predicting the results of fingerprint-based similarity searching.
chemistry, medicinal,mathematical & computational biology,computer science, interdisciplinary applications
What problem does this paper attempt to address?