What is the ecotoxicity of a given chemical for a given aquatic species? Predicting interactions between species and chemicals using recommender system techniques

M. Viljanen,J. Minnema,P.N.H. Wassenaar,E. Rorije,W. Peijnenburg
DOI: https://doi.org/10.1080/1062936X.2023.2254225
IF: 3.681
2023-09-07
SAR and QSAR in Environmental Research
Abstract:Ecotoxicological safety assessment of chemicals requires toxicity data on multiple species, despite the general desire of minimizing animal testing. Predictive models, specifically machine learning (ML) methods, are one of the tools capable of solving this apparent contradiction as they allow to generalize toxicity patterns across chemicals and species. However, despite the availability of large public toxicity datasets, the data is highly sparse, complicating model development. The aim of this study is to provide insights into how ML can predict toxicity using a large but sparse dataset. We developed models to predict LC50-values, based on experimental LC50-data covering 2431 organic chemicals and 1506 aquatic species from the ECOTOX-database. Several well-known ML techniques were evaluated and a new ML model was developed, inspired by recommender systems. This new model involves a simple linear model that learns low-rank interactions between species and chemicals using factorization machines. We evaluated the predictive performances of the developed models based on two validation settings: 1) predicting unseen chemical-species pairs, and 2) predicting unseen chemicals. The results of this study show that ML models can accurately predict LC50-values in both validation settings. Moreover, we show that the novel factorization machine approach can match well-tuned, complex, ML approaches.
environmental sciences,toxicology,computer science, interdisciplinary applications,chemistry, multidisciplinary,mathematical & computational biology
What problem does this paper attempt to address?