Abstract:Deep neural networks can learn powerful prior probability models for images, as evidenced by the high-quality generations obtained with recent score-based diffusion methods. But the means by which these networks capture complex global statistical structure, apparently without suffering from the curse of dimensionality, remain a mystery. To study this, we incorporate diffusion methods into a multi-scale decomposition, reducing dimensionality by assuming a stationary local Markov model for wavelet coefficients conditioned on coarser-scale coefficients. We instantiate this model using convolutional neural networks (CNNs) with local receptive fields, which enforce both the stationarity and Markov properties. Global structures are captured using a CNN with receptive fields covering the entire (but small) low-pass image. We test this model on a dataset of face images, which are highly non-stationary and contain large-scale geometric structures. Remarkably, denoising, super-resolution, and image synthesis results all demonstrate that these structures can be captured with significantly smaller conditioning neighborhoods than required by a Markov model implemented in the pixel domain. Our results show that score estimation for large complex images can be reduced to low-dimensional Markov conditional models across scales, alleviating the curse of dimensionality.

What problem does this paper attempt to address?

The paper primarily focuses on addressing the issue of how deep neural networks can avoid the curse of dimensionality when processing high-dimensional image data and proposes a method based on a multi-scale local conditional probability model. The core issue of the paper is to understand how deep neural networks can learn complex global statistical structures without being affected by the curse of dimensionality. To investigate this issue, the authors propose a method that integrates diffusion methods into a multi-scale decomposition framework by assuming a local Markov model of wavelet coefficients conditioned on coarser scale coefficients to reduce dimensionality. This method utilizes convolutional neural networks (CNNs) and local receptive fields to enforce this Markov property while maintaining translation invariance. Specifically, the authors' work can be summarized as follows: 1. **Problem Background**: Deep neural networks have shown excellent performance in generating high-quality images, especially in recent score diffusion methods. However, the specific mechanisms by which these networks capture complex global statistical structures remain unclear. 2. **Solution Strategy**: The paper proposes a multi-scale decomposition method, where the conditional probability distribution of wavelet coefficients is assumed to have a local Markov property to reduce dimensionality. This method leverages the local receptive field properties of CNNs to enforce translation invariance and the Markov property. 3. **Model Characteristics**: The model employs multi-scale wavelet decomposition, where wavelet coefficients at each scale are conditionally dependent on coarser scale wavelet coefficients. This conditional dependency is modeled as a local Markov random field, allowing the estimation of scores through a low-dimensional Markov conditional model, thereby alleviating the curse of dimensionality. 4. **Experimental Validation**: The paper conducts experiments on a face image dataset, demonstrating that this method can effectively capture long-range dependencies even with very small receptive fields, achieving good results in denoising, super-resolution, and image synthesis tasks. In summary, this study introduces a novel multi-scale local conditional probability model that not only addresses the curse of dimensionality faced by deep learning models when processing high-dimensional images but also demonstrates how to effectively capture complex global structures in images.

Learning multi-scale local conditional probability models of images

Local Conditional Neural Fields for Versatile and Generalizable Large-Scale Reconstructions in Computational Imaging

Image Denoising via Multi-scale Nonlinear Diffusion Models

Image Denoising Via Multiscale Nonlinear Diffusion Models.

Probabilistic Graph Attention Network with Conditional Kernels for Pixel-Wise Prediction

Image Steganalysis with Multi-Scale Residual Network

Learning Spatio-Temporal Representation with Local and Global Diffusion

Learning Depth Super-Resolution by Using Multi-Scale Convolutional Neural Network.

Coarse-to-fine Trained Multi-Scale Convolutional Neural Networks for Image Classification.

Image Neural Field Diffusion Models

Multi-scale Conditional Generative Modeling for Microscopic Image Restoration

Multiscale Conditional Regularization for Convolutional Neural Networks

Lossless Image Compression Using a Multi-Scale Progressive Statistical Model

Mixtures of conditional Gaussian scale mixtures applied to multiscale image representations

A Convolutional Neural Network-Based Conditional Random Field Model for Structured Multi-Focus Image Fusion Robust to Noise

Multi-Domain Multi-Scale Diffusion Model for Low-Light Image Enhancement

Tackling Structural Hallucination in Image Translation with Local Diffusion

A Multiscale Image Denoising Algorithm Based On Dilated Residual Convolution Network

Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering

Multi-scale Unified Network for Image Classification

Multi-scale Image Analysis Based on Non-Classical Receptive Field Mechanism.