Evaluation of Different Methods for Identification of Structural Alerts Using Chemical Ames Mutagenicity Data Set As a Benchmark.

Hongbin Yang,Jie Li,Zengrui Wu,Weihua Li,Guixia Liu,Yun Tang
DOI: https://doi.org/10.1021/acs.chemrestox.7b00083
2017-01-01
Chemical Research in Toxicology
Abstract:Identification of structural alerts for toxicity is useful in drug discovery and other fields such as environmental protection. With structural alerts, researchers can quickly identify potential toxic compounds and learn how to modify them. Hence, it is important to determine structural alerts from a large number of compounds quickly and accurately. There are already many methods reported for identification of structural alerts. However, how to evaluate those methods is a problem. In this paper, we tried to evaluate four of the methods for monosubstructure identification with three indices including accuracy rate, coverage rate, and information gain to compare their advantages and disadvantages. The Kazius' Ames mutagenicity data set was used as the benchmark, and the four methods were MoSS (graph-based), SARpy (fragment-based), and two fingerprint-based methods including Bioalerts and the fingerprint (FP) method we previously used. The results showed that Bioalerts and FP could detect key substructures with high accuracy and coverage rates because they allowed unclosed rings and wildcard atom or bond types. However, they also resulted in redundancy so that their predictive performance was not as good as that of SARpy. SARpy was competitive in predictive performance in both training set and external validation set. These results might be helpful for users to select appropriate methods and further development of methods for identification of structural alerts.
What problem does this paper attempt to address?