Machine Learning Model for Screening Thyroid Stimulating Hormone Receptor Agonists Based on Updated Datasets and Improved Applicability Domain Metrics

Wenjia Liu,Zhongyu Wang,Jingwen Chen,Weihao Tang,Haobo Wang
DOI: https://doi.org/10.1021/acs.chemrestox.3c00074
2023-01-01
Chemical Research in Toxicology
Abstract:Machinelearning (ML) models for screening endocrine-disruptingchemicals (EDCs), such as thyroid stimulating hormone receptor (TSHR)agonists, are essential for sound management of chemicals. Previousmodels for screening TSHR agonists were built on imbalanced datasetsand lacked applicability domain (AD) characterization essential forregulatory application. Herein, an updated TSHR agonist dataset wasbuilt, for which the ratio of active to inactive compounds greatlyincreased to 1:2.6, and chemical spaces of structure-activitylandscapes (SALs) were enhanced. Resulting models based on 7 molecularrepresentations and 4 ML algorithms were proven to outperform previousones. Weighted similarity density (& rho;(s)) and weightedinconsistency of activities (I (A)) wereproposed to characterize the SALs, and a state-of-the-art AD characterizationmethodology AD(SAL){& rho;(s), I (A)} was established. An optimal classifier developed withPubChem fingerprints and the random forest algorithm, coupled withAD(SAL){& rho;(s) & GE; 0.15, I (A) & LE; 0.65}, exhibited good performance on the validationset with the area under the receiver operating characteristic curvebeing 0.984 and balanced accuracy being 0.941 and identified 90 TSHRagonist classes that could not be found previously. The classifiertogether with the AD(SAL){& rho;(s), I (A)} may serve as efficient tools for screening EDCs, andthe AD characterization methodology may be applied to other ML models.
What problem does this paper attempt to address?