Mining the Critical Conditions for New Hypotheses of Materials from Historical Reaction Data

Zhenchao Ouyang,Yu Liu,Jianwei Niu
DOI: https://doi.org/10.1109/smartworld.2018.00087
2018-01-01
Abstract:The new findings in material science often require a high research cost for the following two aspects. First is that the chemical reaction craft needs continuous optimization and may consume lots of valuable reactants and apparatus during daily experiments. Second, the success of a designed experiment relies heavily on researchers' experience. With the starting of the Materials Genome Initiative (MGI) project, researchers are beginning to record historical reaction data, and seek new solutions via computer techniques, such as data mining and machine learning. In this paper, we study the reaction data of inorganic-organic hybrid materials from the Dark Reaction Project from Haverford College with simple machine learning algorithms (i.e., Bayes Net, SVM and C4.5), ensemble learning models (i.e., Random Forest, Stacking, Gradient Boosting Decision Tree (GBDT) and XGBoost), and deep neural network models. Besides accuracy of the prediction models, we also analyze the reaction conditions that have important reflecting in chemistry with different ranking algorithms. With a series of evaluation, we find that the welldesigned stacking-based ensemble learning model can reach the highest prediction accuracy of 61% (8% higher than GBDT and 5% higher than XGBoost) on the top50 subsets based on 'symmetrical uncertainty ranking' on the standalone data set which was not used in the Dark Reaction Project before.
What problem does this paper attempt to address?