Toxicity Classification of Oxide Nanomaterials: Effects of Data Gap Filling and PChem Score-based Screening Approaches

My Kieu Ha,Tung Xuan Trinh,Jang Sik Choi,Desy Maulina,Hyung Gi Byun,Tae Hyun Yoon
DOI: https://doi.org/10.1038/s41598-018-21431-9
IF: 4.6
2018-02-16
Scientific Reports
Abstract:Development of nanotoxicity prediction models is becoming increasingly important in the risk assessment of engineered nanomaterials. However, it has significant obstacles caused by the wide heterogeneities of published literature in terms of data completeness and quality. Here, we performed a meta-analysis of 216 published articles on oxide nanoparticles using 14 attributes of physicochemical, toxicological and quantum-mechanical properties. Particularly, to improve completeness and quality of the extracted dataset, we adapted two preprocessing approaches: data gap-filling and physicochemical property based scoring. Performances of nano-SAR classification models revealed that the dataset with the highest score value resulted in the best predictivity with compromise in its applicability domain. The combination of physicochemical and toxicological attributes was proved to be more relevant to toxicity classification than quantum-mechanical attributes. Overall, by adapting these two preprocessing methods, we demonstrated that meta-analysis of nanotoxicity literatures could provide an effective alternative for the risk assessment of engineered nanomaterials.
multidisciplinary sciences
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the extensive heterogeneity of data integrity and data quality encountered in the development of toxicity prediction models for nanomaterials (especially metal oxide nanoparticles). Specifically: 1. **Data integrity**: Data in published literature often have missing values, which affect the training and prediction capabilities of the model. For example, there may be a large number of blanks in data regarding the physicochemical properties, toxicological properties, and quantum - mechanical properties of nanoparticles. 2. **Data quality**: The lack of standardization of test protocols among different laboratories leads to uneven data quality. In addition, there are also problems with the reliability of data sources. Some data may come from manufacturers' specifications, and the accuracy of these data cannot be guaranteed. To address these problems, the authors adopted two pre - processing methods: - **Data imputation**: Fill in missing values by using manufacturers' specifications or other reference data to improve data integrity. - **Physicochemical property scoring**: Evaluate the quality of physicochemical data through a scoring framework and screen out high - quality data for model training. Through these methods, the authors hope to improve the quality and integrity of the data set, thereby improving the performance of the nanomaterial toxicity prediction model. Specifically, the authors' goals are: - Improve the prediction ability of the model, especially after the data quality and integrity are enhanced. - Determine which properties (physicochemical, toxicological, or quantum - mechanical properties) are most important for toxicity classification. - Analyze the impact of different pre - processing methods on model performance, especially the effects after data imputation and quality screening. - Define the Applicability Domain (AD) of the model to ensure the prediction reliability of the model on new data. Through these efforts, the authors hope to provide an effective alternative method for the risk assessment of nanomaterials.