Sampling Approximately Low-Rank Ising Models: MCMC meets Variational Methods

Frederic Koehler,Holden Lee,Andrej Risteski
DOI: https://doi.org/10.48550/arXiv.2202.08907
2022-02-18
Abstract:We consider Ising models on the hypercube with a general interaction matrix $J$, and give a polynomial time sampling algorithm when all but $O(1)$ eigenvalues of $J$ lie in an interval of length one, a situation which occurs in many models of interest. This was previously known for the Glauber dynamics when *all* eigenvalues fit in an interval of length one; however, a single outlier can force the Glauber dynamics to mix torpidly. Our general result implies the first polynomial time sampling algorithms for low-rank Ising models such as Hopfield networks with a fixed number of patterns and Bayesian clustering models with low-dimensional contexts, and greatly improves the polynomial time sampling regime for the antiferromagnetic/ferromagnetic Ising model with inconsistent field on expander graphs. It also improves on previous approximation algorithm results based on the naive mean-field approximation in variational methods and statistical physics. Our approach is based on a new fusion of ideas from the MCMC and variational inference worlds. As part of our algorithm, we define a new nonconvex variational problem which allows us to sample from an exponential reweighting of a distribution by a negative definite quadratic form, and show how to make this procedure provably efficient using stochastic gradient descent. On top of this, we construct a new simulated tempering chain (on an extended state space arising from the Hubbard-Stratonovich transform) which overcomes the obstacle posed by large positive eigenvalues, and combine it with the SGD-based sampler to solve the full problem.
Data Structures and Algorithms,Machine Learning,Probability
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to sample within polynomial time in the Ising model on a hypercube when most of the eigenvalues of the interaction matrix \(J\) are located within an interval of length 1. Specifically, the paper proposes a new algorithm that can sample the Ising model with a constant number of abnormal eigenvalues (i.e., eigenvalues not within the above - mentioned interval) within polynomial time. ### Problem Background The Ising model is a probability distribution defined on the hypercube \(\{\pm 1\}^n\), in the form of: \[p_{J,h}(\sigma)=\frac{1}{Z}\exp\left(\frac{1}{2}\langle\sigma,J\sigma\rangle+\langle h,\sigma\rangle\right)\] where \(Z\) is the normalization constant, called the partition function. Estimating the partition function \(Z\) and sampling from the Ising model are fundamental problems in computational theory and applications. However, exact calculation of the partition function is a #P - hard problem, and approximate calculation is also an NP - hard problem. ### Previous Methods and Their Limitations 1. **Glauber Dynamics**: When all eigenvalues are located within an interval of length 1, Glauber dynamics can mix rapidly. But if there is a single abnormal eigenvalue, Glauber dynamics may become very slow. 2. **Variational Methods**: Variational methods approximate the calculation of the partition function by optimizing problems, but in some cases can only provide a rather rough approximation. ### Contributions of the Paper The paper combines the ideas of MCMC (Markov Chain Monte Carlo) and variational inference and proposes a new algorithm that can perform polynomial - time sampling in the following situations: - When most of the eigenvalues of the interaction matrix \(J\) are located within an interval of length 1 and there are only a constant number of abnormal eigenvalues. ### Main Results 1. **Sampling Complexity**: For an Ising model with \(d_+\) positive eigenvalues greater than \(1-\frac{1}{c}\) and \(d_-\) negative eigenvalues \(-\lambda_1,\dots,-\lambda_{d_-}\), it can give an \(e^\varepsilon\)-times approximation of the partition function \(Z_{J,h}\) with high probability within time \(O((\|J\|_{\text{op}}n)^{O(d_++ 1)}e^{O(c(\lambda_1+\cdots+\lambda_{d_-}))})\). 2. **Sampling Algorithm**: It can sample from the distribution \(P_{J,h}\) within time \(O((\|J\|_{\text{op}}n\log(1/\varepsilon))^{O(1 + d_+)})e^{O(c(\lambda_1+\cdots+\lambda_{d_-}))}\). ### Applications This algorithm can be applied to multiple scenarios, including: - Sampling antiferromagnetic or ferromagnetic Ising models on expander graphs (even in the presence of inconsistent external fields). - Sampling from Hopfield networks (with a fixed number of patterns). - Sampling from complex clustering models in low - dimensional contexts, such as the Contextual Stochastic Block Model. Through these improvements, this paper not only solves the limitations of existing methods but also provides new tools and ideas for efficient sampling.