Interpretable attention-based multi-encoder transformer based QSPR model for assessing toxicity and environmental impact of chemicals

SangYoun Kim,Shahzeb Tariq,SungKu Heo,ChangKyoo Yoo
DOI: https://doi.org/10.1016/j.chemosphere.2023.141086
IF: 8.8
2024-01-13
Chemosphere
Abstract:The rising demand from consumer goods and pharmaceutical industry is driving a fast expansion of newly developed chemicals. The conventional toxicity testing of unknown chemicals is expensive, time-consuming, and raises ethical concerns. The quantitative structure–property relationship (QSPR) is an efficient computational method because it saves time, resources, and animal experimentation. Advances in machine learning have improved chemical analysis in QSPR studies, but the real-world application of machine learning-based QSPR studies was limited by the unexplainable 'black box' feature of the machine learnings. In this study, multi-encoder structure-to-toxicity (S2T)-transformer based QSPR model was developed to estimate the properties of polychlorinated biphenyls (PCBs) and endocrine disrupting chemicals (EDCs). Simplified molecular input line entry systems (SMILES) and molecular descriptors calculated by the Dragon 6 software, were simultaneously considered as input of QSPR model. Furthermore, an attention-based framework is proposed to describe the relationship between the molecular structure and toxicity of hazardous chemicals. The S2T-transformer model achieved the highest R 2 scores of 0.918, 0.856, and 0.907 for logarithm of octanol-water partition coefficient (Log K OW ), octanol-air partition coefficient (Log K OA ), and bioconcentration factor (Log BCF) estimation of PCBs, respectively. Moreover, the attention weights were able to properly interpret the lateral ( meta , para ) chlorination associated with PCBs toxicity and environmental impact.
environmental sciences
What problem does this paper attempt to address?