Prediction of molecular-specific mutagenic alerts and related mechanisms of chemicals by a convolutional neural network (CNN) model based on SMILES split

Chao Chen,Zhengliang Huang,Xuyan Zou,Sheng Li,Di Zhang,Shou-Lin Wang
DOI: https://doi.org/10.1016/j.scitotenv.2024.170435
IF: 9.8
2024-02-01
The Science of The Total Environment
Abstract:Structural alerts (SAs) are essential to identify chemicals for toxicity evaluation and health risk assessment. We constructed a novel SMILES split-based deep learning model (SSDL) that was trained and verified with 5850 chemicals from the ISSSTY database and 384 external test chemicals from published papers. The training accuracy was above 0.90 and the evaluation metrics (precision, recall and F1-score) all reached 0.78 or above on both internal and external test chemicals. In this model, the molecular-specific fragment importance of chemicals was first quantified independently. Then, the SA identification method based on the importance of these fragments was statistically analyzed and verified with the ISSSTY test and external test chemicals containing one of 28 typical SAs, and most of the performances were better than that of expert rules. Furthermore, a mutagenicity mechanism prediction method was developed using 237 chemicals with four known mutagenic mechanisms based on molecular similarity calibrated by the SSDL method and fragment importance, which significantly improved accuracy in three mechanisms and had comparable accuracy in the other one compared to traditional methods. Overall, the SSDL model quantifying fragment toxicity within molecules would be a novel potentially powerful tool in the determination and visualization of molecular-specific SAs and the prediction of mutagenicity mechanisms for environmental or industrial compounds and drugs.
environmental sciences
What problem does this paper attempt to address?