Algebraic Machine Learning with an Application to Chemistry

Ezzeddine El Sai,Parker Gara,Markus J. Pflaum
DOI: https://doi.org/10.3934/fods.2024004
2024-02-22
Abstract:As datasets used in scientific applications become more complex, studying the geometry and topology of data has become an increasingly prevalent part of the data analysis process. This can be seen for example with the growing interest in topological tools such as persistent homology. However, on the one hand, topological tools are inherently limited to providing only coarse information about the underlying space of the data. On the other hand, more geometric approaches rely predominately on the manifold hypothesis, which asserts that the underlying space is a smooth manifold. This assumption fails for many physical models where the underlying space contains singularities. In this paper we develop a machine learning pipeline that captures fine-grain geometric information without having to rely on any smoothness assumptions. Our approach involves working within the scope of algebraic geometry and algebraic varieties instead of differential geometry and smooth manifolds. In the setting of the variety hypothesis, the learning problem becomes to find the underlying variety using sample data. We cast this learning problem into a Maximum A Posteriori optimization problem which we solve in terms of an eigenvalue computation. Having found the underlying variety, we explore the use of Gröbner bases and numerical methods to reveal information about its geometry. In particular, we propose a heuristic for numerically detecting points lying near the singular locus of the underlying variety.
Algebraic Geometry,Computational Geometry,Machine Learning,Mathematical Physics
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper attempts to address how to capture fine-grained geometric information in datasets using machine learning methods in scientific research, particularly in the field of chemistry, without relying on the smoothness assumption (i.e., the manifold assumption). Specifically, the paper proposes a machine learning pipeline based on algebraic geometry to solve the following problems: 1. **How to handle datasets containing singularities**: - Traditional geometric methods often rely on the manifold assumption, which assumes that data samples actually lie on a smooth submanifold. However, the data space in many physical models contains singularities, making the manifold assumption inapplicable. - This paper introduces a new method to handle these singularities by incorporating concepts from algebraic geometry and algebraic varieties. 2. **How to find potential algebraic varieties from sample data**: - The authors transform the learning problem into a Maximum A Posteriori (MAP) optimization problem and solve it through eigenvalue computation. - This method allows for the identification of polynomials that define the underlying geometric structure of the dataset, thereby revealing the algebraic varieties of the data. 3. **How to use Gröbner bases and other numerical methods to analyze the geometric properties of algebraic varieties**: - After identifying the algebraic varieties, the authors further explore how to use Gröbner basis computation and other numerical methods to extract geometric information about the algebraic varieties, particularly how to detect points near singularities. 4. **How to validate the effectiveness of this method in practical applications**: - The authors tested the proposed algebraic machine learning pipeline on synthetic data and chemical data sampled from the cyclooctane conformational space, validating its effectiveness and robustness in practical applications. ### Summary The main contribution of the paper is the development of a new machine learning framework that can effectively capture and analyze fine-grained geometric information in datasets without relying on the smoothness assumption. This method is particularly suitable for handling datasets containing singularities, such as molecular conformation data in the field of chemistry. By combining algebraic geometry and machine learning techniques, the paper provides new tools and perspectives for scientific research.