Optimizing Training Data Set for the Machine Learning Potential of Li-Si Alloys via Structural Similarity-based Screening

Nan Xu,Chen Li,Yao Shi,Qing Shao,Yi He
2021-01-01
Abstract: Machine learning potential enables molecular dynamics simulations of systems beyond the capability of traditional force fields. One challenge in developing machine learning potential is how to construct a data set with low sample redundancy. This work investigates the method to optimize the training data set while maintaining the desirable accuracy of the machine learning potential using the structural similarity algorithm. We construct several subsets ranging from 200-1500 sample configurations by selecting representative configurations from a 6183-sample data set using the farthest point sampling method and examine the ability of the machine learning potential trained from the subsets to predict energy, atomic forces and structural properties of Li-Si systems. The simulation results show that the potential developed from 400 configurations can be as accurate as the one developed from the 6183-sample data set. In addition, our computation results highlight that the structure-comparison algorithms can not only effectively remove redundant from training sets, but also achieve an appropriate distribution of samples in training data sets.
What problem does this paper attempt to address?