SepPCNET: Deeping Learning on a 3D Surface Electrostatic Potential Point Cloud for Enhanced Toxicity Classification and Its Application to Suspected Environmental Estrogens

Liguo Wang,Lu Zhao,Xian Liu,Jianjie Fu,Aiqian Zhang
DOI: https://doi.org/10.1021/acs.est.1c01228
2021-07-09
Abstract:Deep learning (DL) offers an unprecedented opportunity to revolutionize the landscape of toxicity prediction based on quantitative structure–activity relationship (QSAR) studies in the big data era. However, the structural description in the reported DL-QSAR models is still restricted to the two-dimensional level. Inspired by point clouds, a type of geometric data structure, a novel three-dimensional (3D) molecular surface point cloud with electrostatic potential (SepPC) was proposed to describe chemical structures. Each surface point of a chemical is assigned its 3D coordinate and molecular electrostatic potential. A novel DL architecture SepPCNET was then introduced to directly consume unordered SepPC data for toxicity classification. The SepPCNET model was trained on 1317 chemicals tested in a battery of 18 estrogen receptor-related assays of the ToxCast program. The obtained model recognized the active and inactive chemicals at accuracies of 82.8 and 88.9%, respectively, with a total accuracy of 88.3% on the internal test set and 92.5% on the external test set, which outperformed other up-to-date machine learning models and succeeded in recognizing the difference in the activity of isomers. Additional insights into the toxicity mechanism were also gained by visualizing critical points and extracting data-driven point features of active chemicals.The Supporting Information is available free of charge at <a class="ext-link" href="/doi/10.1021/acs.est.1c01228?goto=supporting-info">https://pubs.acs.org/doi/10.1021/acs.est.1c01228</a>.Original code for the SepPCNET model construction, README file for installation and instruction, and two examples (<a class="ext-link" href="/doi/suppl/10.1021/acs.est.1c01228/suppl_file/es1c01228_si_001.zip">ZIP</a>)Visualized structural feature distribution of different datasets using <i>t</i>-SNE (Figure S1); effect of the number of surface points and global features on the DL model performance (Figure S2); SepPCNET learning curves (Figure S3); histogram and cumulative frequency graph of the numbers of points in the point clouds of the internal dataset (Figure S4); histograms and graphs of the cumulative percentage of both the distances and the ESP difference between the removed points and their nearest neighbor remaining in place for 17β-estradiol and emamectin (Figure S5); relations between the global features aggregated from the point clouds before and after the undersampling process for 17β-estradiol and emamectin (Figure S6); PRC for the SepPCNET model (Figure S7); critical points, the SepPC skeleton shapes, and distribution maps of the global features obtained for three chemicals (Figure S8); distribution of ESP values on the molecular surface of 17β-estradiol, 17α-estradiol, and the designed molecule (Figure S9); datasets used for the DL-QSAR modeling study (Table S1); model architecture adjustment for fully connected layers (Table S2); hyperparameters tuned during the model training process (Table S3); 5-fold nested cross-validation result for the SepPCNET and other machine learning models on the same dataset (Table S4); ER agonist potencies and predicted labels for the chemicals in the external dataset (Table S5); internal validation and 5-fold nested cross-validation result for the multi-task model (Table S6); ER agonist potencies and predicted labels for the chemicals in the internal dataset (Table S7); details of application datasets, data quality control, and SepPC calculation (Text S1); effect of the number of surface points used to describe a molecule and global features on the DL model performance (Text S2); details of model training and evaluation metrics (Text S3); rapid learning of SepPCNET for toxicity classification (Text S4); and preliminary study of the SepPCNET-based multi-task model (Text S5) (<a class="ext-link" href="/doi/suppl/10.1021/acs.est.1c01228/suppl_file/es1c01228_si_002.pdf">PDF</a>)This article has not yet been cited by other publications.
environmental sciences,engineering, environmental
What problem does this paper attempt to address?