Abstract:Data-driven Riemannian geometry has emerged as a powerful tool for interpretable representation learning, offering improved efficiency in downstream tasks. Moving forward, it is crucial to balance cheap manifold mappings with efficient training algorithms. In this work, we integrate concepts from pullback Riemannian geometry and generative models to propose a framework for data-driven Riemannian geometry that is scalable in both geometry and learning: score-based pullback Riemannian geometry. Focusing on unimodal distributions as a first step, we propose a score-based Riemannian structure with closed-form geodesics that pass through the data probability density. With this structure, we construct a Riemannian autoencoder (RAE) with error bounds for discovering the correct data manifold dimension. This framework can naturally be used with anisotropic normalizing flows by adopting isometry regularization during training. Through numerical experiments on various datasets, we demonstrate that our framework not only produces high-quality geodesics through the data support, but also reliably estimates the intrinsic dimension of the data manifold and provides a global chart of the manifold, even in high-dimensional ambient spaces.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to balance the scalability of data - driven Riemannian geometric structure training and the scalability of its corresponding manifold mapping evaluation. Specifically, the author proposes a score - based pull - back Riemannian geometric framework, aiming to overcome the scalability challenges encountered by existing methods when dealing with high - dimensional data while maintaining an efficient training algorithm. Through this method, the author hopes to better utilize the combination of Riemannian geometry and generative models to improve the efficiency and accuracy of data representation learning. ### Paper Background Data - driven Riemannian geometry has become a powerful tool for interpretable representation learning and has shown higher efficiency in downstream tasks. However, existing methods face scalability challenges when dealing with high - dimensional data, especially in the calculation of manifold mappings. In addition, although generative models can be trained efficiently, the Riemannian geometry induced by them may be numerically intractable. ### Paper Contributions 1. **Theoretical Contributions**: - A score - based pull - back Riemannian metric is proposed, enabling manifold mappings to respect the data distribution. - It is proved that this density - based Riemannian structure naturally leads to a Riemannian autoencoder (RAE) and provides an error bound for the expected reconstruction error. - An adaptive learning scheme based on normalizing flows is introduced to find the density to be integrated into the Riemannian framework. 2. **Practical Contributions**: - It is shown how to implement data - driven Riemannian geometry through two simple adaptations of the normalizing flow framework, significantly expanding its potential in downstream applications. - Through numerical experiments, it is verified that this framework can not only generate high - quality manifold geodesics, but also reliably estimate the intrinsic dimension of the data manifold and provide a global atlas of the manifold, even in a high - dimensional ambient space. ### Main Results - **Accuracy of Manifold Mappings**: Experiments verify the performance of the proposed framework on different datasets, and the results show that it outperforms the baseline methods in generating accurate and stable manifold mappings. - **Performance of Riemannian Autoencoders**: Experiments show that this method can effectively learn Riemannian autoencoders and accurately capture the intrinsic dimension of the data manifold, thus performing well on both low - dimensional and high - dimensional manifolds. ### Conclusion In this work, the author takes the first step in combining generative models with Riemannian geometry and proposes a score - based pull - back Riemannian geometric framework, which successfully balances the scalability of training and evaluation. This framework is not only theoretically significant but also shows strong potential in practical applications, especially when dealing with high - dimensional data. Future work will further expand this framework to handle multi - modal densities and other more complex scenarios.

Score-based pullback Riemannian geometry

Pulling back symmetric Riemannian geometry for data analysis

Dynamically Stable Poincaré Embeddings for Neural Manifolds

Geometry-Aware Generative Autoencoders for Warped Riemannian Metric Learning and Generative Modeling on Data Manifolds

Isometric Immersion Learning with Riemannian Geometry

Riemannian manifold learning.

Algorithm for Riemannian Manifold Learning

(Deep) Generative Geodesics

Dimensionality Reduction on Grassmannian via Riemannian Optimization: A Generalized Perspective.

Riemannian Residual Neural Networks

RMLR: Extending Multinomial Logistic Regression into General Geometries

Learning to Optimize on Riemannian Manifolds

Short and Straight: Geodesics on Differentiable Manifolds

Geometry Flow-Based Deep Riemannian Metric Learning

Joint Normalization and Dimensionality Reduction on Grassmannian: A Generalized Perspective

Pullback Flow Matching on Data Manifolds

Riemannian Metric Learning Based on Curvature Flow.

Riemannian Optimization for Non-convex Euclidean Distance Geometry with Global Recovery Guarantees

Geometry of Score Based Generative Models

Flow Matching on General Geometries

Robust Geodesic Regression