Accelerated Neural Network Training through Dimensionality Reduction for High-Throughput Screening of Topological Materials

Ruman Moulik,Ankita Phutela,Sajjan Sheoran,Saswata Bhattacharya
2023-08-24
Abstract:Machine Learning facilitates building a large variety of models, starting from elementary linear regression models to very complex neural networks. Neural networks are currently limited by the size of data provided and the huge computational cost of training a model. This is especially problematic when dealing with a large set of features without much prior knowledge of how good or bad each individual feature is. We try tackling the problem using dimensionality reduction algorithms to construct more meaningful features. We also compare the accuracy and training times of raw data and data transformed after dimensionality reduction to deduce a sufficient number of dimensions without sacrificing accuracy. The indicated estimation is done using a lighter decision tree-based algorithm, AdaBoost, as it trains faster than neural networks. We have chosen the data from an online database of topological materials, Materiae. Our final goal is to construct a model to predict the topological properties of new materials from elementary properties.
Materials Science,Strongly Correlated Electrons
What problem does this paper attempt to address?
The main objective of this paper is to predict the topological properties of new materials using machine learning methods, specifically to predict new Topological Insulators (TI) and Topological Crystalline Insulators (TCI). To address this problem, the authors took the following steps: 1. **Data Acquisition**: Obtained data on known topological materials from the Materiae database and structural and electronic structure data from the Materials Project database. 2. **Feature Space Construction**: Constructed the feature space based on the physicochemical properties of atoms, including electronegativity, atomic number, ionization energy, and other attributes. 3. **Dimensionality Reduction**: Used Principal Component Analysis (PCA) to reduce the dimensionality of the features, decreasing the number of features required to train the neural network to improve training efficiency. 4. **Dimensionality Selection with Decision Tree Model**: Employed the AdaBoost algorithm to determine the minimum number of feature dimensions that can be used while maintaining prediction accuracy. 5. **Neural Network Prediction Model**: Built a Multi-Layer Perceptron (MLP) classifier as the final prediction model to predict whether new substances have non-trivial topological properties. 6. **Validation of Prediction Results**: Used the SymTopo software package to validate the accuracy of the prediction results and further confirmed the validity of the predictions through first-principles calculations. In summary, this study aims to develop an efficient method to predict the topological properties of new materials using machine learning techniques, particularly topological insulators and topological crystalline insulators. This method can significantly reduce computational costs and help accelerate the discovery process of novel topological materials.