Clustering, coding, and the concept of similarity

L. Thorne McCarty
DOI: https://doi.org/10.1007/s10472-024-09929-7
IF: 1.019
2024-03-20
Annals of Mathematics and Artificial Intelligence
Abstract:This paper develops a theory of clustering and coding that combines a geometric model with a probabilistic model in a principled way. The geometric model is a Riemannian manifold with a Riemannian metric, , which we interpret as a measure of dissimilarity . The probabilistic model consists of a stochastic process with an invariant probability measure that matches the density of the sample input data. The link between the two models is a potential function, , and its gradient, . We use the gradient to define the dissimilarity metric, which guarantees that our measure of dissimilarity will depend on the probability measure. Finally, we use the dissimilarity metric to define a coordinate system on the embedded Riemannian manifold, which gives us a low-dimensional encoding of our original data.
computer science, artificial intelligence,mathematics, applied
What problem does this paper attempt to address?