Replica analysis of Bayesian data clustering

Alexander Mozeika,Anthony C C Coolen
DOI: https://doi.org/10.1088/1751-8121/ab59af
2019-12-23
Abstract:We use statistical mechanics to study model-based Bayesian data clustering. In this approach, eachpartition of the data into clusters is regarded as a microscopic system state, the negative datalog-likelihood gives the energy of each state, and the data set realisation acts as disorder.Optimal clustering corresponds to the ground state of the system, and is hence obtained from thefree energy via a low 'temperature' limit. We assume that for large sample sizes the free energydensity is self-averaging, and we use the replica method to compute the asymptotic free energydensity. The main order parameter in the resulting (replica symmetric) theory, the distribution ofthe data over the clusters, satisfies a self-consistent equation which can be solved by a populationdynamics algorithm. From this order parameter one computes the average free energy, and all relevantmacroscopic characteristics of the problem. The theory describes numerical experiments perfectly,and gives a sig...
physics, multidisciplinary, mathematical
What problem does this paper attempt to address?