Representation of molecular structures with persistent homology for machine learning applications in chemistry

Jacob Townsend,Cassie Putman Micucci,John H. Hymel,Vasileios Maroulas,Konstantinos D. Vogiatzis
DOI: https://doi.org/10.1038/s41467-020-17035-5
IF: 16.6
2020-06-26
Nature Communications
Abstract:Abstract Machine learning and high-throughput computational screening have been valuable tools in accelerated first-principles screening for the discovery of the next generation of functionalized molecules and materials. The application of machine learning for chemical applications requires the conversion of molecular structures to a machine-readable format known as a molecular representation. The choice of such representations impacts the performance and outcomes of chemical machine learning methods. Herein, we present a new concise molecular representation derived from persistent homology, an applied branch of mathematics. We have demonstrated its applicability in a high-throughput computational screening of a large molecular database (GDB-9) with more than 133,000 organic molecules. Our target is to identify novel molecules that selectively interact with CO 2 . The methodology and performance of the novel molecular fingerprinting method is presented and the new chemically-driven persistence image representation is used to screen the GDB-9 database to suggest molecules and/or functional groups with enhanced properties.
multidisciplinary sciences
What problem does this paper attempt to address?