denmarf: a Python package for density estimation using masked autoregressive flow

Rico K. L. Lo
2023-05-22
Abstract:Masked autoregressive flow (MAF) is a state-of-the-art non-parametric density estimation technique. It is based on the idea (known as a normalizing flow) that a simple base probability distribution can be mapped into a complicated target distribution that one wishes to approximate, using a sequence of bijective transformations. The denmarf package provides a scikit-learn-like interface in Python for researchers to effortlessly use MAF for density estimation in their applications to evaluate probability densities of the underlying distribution of a set of data and generate new samples from the data, on either a CPU or a GPU, as simple as "from denmarf import DensityEstimate; de = DensityEstimate().fit(X)". The package also implements logistic transformations to facilitate the fitting of bounded distributions.
Instrumentation and Methods for Astrophysics
What problem does this paper attempt to address?
The paper aims to address the efficiency issues in non-parametric density estimation, particularly for large-scale datasets. Traditional Kernel Density Estimation (KDE) methods are computationally expensive when handling large datasets, with a computational complexity that is linear with respect to the dataset size \( N \) and the number of evaluations \( M \), i.e., \( O(MND) \). This makes KDE methods very time-consuming in scenarios where multiple evaluations of probability density are required (e.g., simulating the probability density of a large number of astronomical lens images). To solve this problem, the authors propose a Python package named **denmarf**, which is based on the **Masked Autoregressive Flow (MAF)** technique. MAF is an advanced non-parametric density estimation technique that maps a simple base distribution to a complex target distribution through a series of bijective transformations, thereby achieving efficient density estimation. Unlike KDE, the computational complexity of MAF's density estimation does not depend on the dataset size \( N \), making it significantly advantageous for large-scale datasets. The main features of the **denmarf** package include: 1. **User-friendly**: Provides an interface similar to `scikit-learn`, allowing researchers to easily use MAF for density estimation without needing to delve into the details of deep learning libraries. 2. **Efficiency**: Can run efficiently on both CPU and GPU, suitable for density estimation and new sample generation for large-scale datasets. 3. **Support for bounded distributions**: Handles data with bounded distributions through logit transformation, ensuring that the transformed data is unbounded, thereby improving the model's applicability and accuracy. In summary, by introducing the **denmarf** package, the paper aims to provide an efficient and user-friendly tool to address the efficiency issues of traditional KDE methods when dealing with large-scale datasets.