Transformer-based Parameter Estimation in Statistics

Xiaoxin Yin,David S. Yin
2024-02-28
Abstract:Parameter estimation is one of the most important tasks in statistics, and is key to helping people understand the distribution behind a sample of observations. Traditionally parameter estimation is done either by closed-form solutions (e.g., maximum likelihood estimation for Gaussian distribution), or by iterative numerical methods such as Newton-Raphson method when closed-form solution does not exist (e.g., for Beta distribution).
Machine Learning
What problem does this paper attempt to address?
The paper proposes a Transformer-based approach to address the problem of parameter estimation in statistics. Traditionally, parameter estimation can be achieved through closed-form solutions (e.g., maximum likelihood estimation for Gaussian distribution) or iterative numerical methods (such as the Newton-Raphson method). However, for certain distributions (like the Beta distribution), maximum likelihood estimation does not have a closed-form solution and relies on numerical optimization methods. The new method proposed in the paper leverages the powerful capabilities of the Transformer model by converting samples into embedding sequences and feeding them into the Transformer model, thus avoiding the need for closed-form solutions or mathematical derivations, and even without knowing the probability density function. The main advantage of this approach is its ability to handle mathematically complex probability density functions, even if these functions do not have closed-form solutions, by using the trained Transformer model to estimate the distribution parameters in a single inference. In the experimental section, the authors compared their method with traditional maximum likelihood estimation on several common distributions (normal distribution, exponential distribution, and Beta distribution) and demonstrated that their method can achieve similar or even better mean squared error (MSE) performance in most cases. Furthermore, the method performs exceptionally well when the parameter range is known. Overall, this paper contributes a novel parameter estimation method, particularly for complex distributions that do not have simple closed-form solutions, showing its superiority. This method requires relatively few computational resources (approximately 60 to 70 hours of GPU time to train a model), which is significantly less than the time required for traditional mathematical derivations or algorithm design.