ichor: A Python library for computational chemistry data management and machine learning force field development

Paul Popelier,Yulian Manchev
DOI: https://doi.org/10.26434/chemrxiv-2024-f8h7n
2024-06-10
Abstract:We present ichor, an open-source Python library that simplifies data management in computational chemistry and streamlines machine learning force field development. Ichor implements many easily extendable file management tools, in addition to a lazy file reading system, allowing efficient management of hundreds of thousands of computational chemistry files. Data from calculations can be readily stored into databases for easy sharing and post-processing. Raw data can be directly processed by ichor to create machine learning-ready datasets. In addition to powerful data-related capabilities, ichor provides interfaces to popular workload management software employed by High Performance Computing clusters, making for effortless submission of thousands of separate calculations with only a single line of Python code. Furthermore, a simple-to-use command line interface has been implemented through a series of menu systems to further increase accessibility and efficiency of common important ichor tasks. Finally, ichor implements general tools for visualization and analysis of datasets and tools for measuring machine-learning model quality both on test set data and in simulations. With the current functionalities, ichor can serve as an end-to-end data procurement, data management, and analysis solution for machine-learning force-field development.
Chemistry
What problem does this paper attempt to address?
The paper introduces an open-source Python library called Ichor, which aims to simplify computational chemistry data management and machine learning force field development. Ichor provides scalable file management tools, including a lazy-loading file reading system that efficiently handles large amounts of computational chemistry files. It also supports storing computational results in a database for sharing and post-processing, and directly processes raw data to create machine learning-ready datasets. In addition, Ichor interfaces with commonly used workload management software, making it easier to submit large numbers of computational tasks on high-performance computing clusters. Through a command-line interface, Ichor improves the accessibility and convenience of common tasks. The library also includes tools for data visualization, analysis, and evaluating the quality and performance of machine learning models. The main goal of Ichor is to provide an end-to-end solution for data acquisition, management, and analysis, particularly for the development of machine learning force fields, while its functionalities can also be applied to other areas of computational chemistry.