Bayesian inference: More than Bayes's theorem

Thomas J. Loredo,Robert L. Wolpert
2024-06-29
Abstract:Bayesian inference gets its name from *Bayes's theorem*, expressing posterior probabilities for hypotheses about a data generating process as the (normalized) product of prior probabilities and a likelihood function. But Bayesian inference uses all of probability theory, not just Bayes's theorem. Many hypotheses of scientific interest are *composite hypotheses*, with the strength of evidence for the hypothesis dependent on knowledge about auxiliary factors, such as the values of nuisance parameters (e.g., uncertain background rates or calibration factors). Many important capabilities of Bayesian methods arise from use of the law of total probability, which instructs analysts to compute probabilities for composite hypotheses by *marginalization* over auxiliary factors. This tutorial targets relative newcomers to Bayesian inference, aiming to complement tutorials that focus on Bayes's theorem and how priors modulate likelihoods. The emphasis here is on marginalization over parameter spaces -- both how it is the foundation for important capabilities, and how it may motivate caution when parameter spaces are large. Topics covered include the difference between likelihood and probability, understanding the impact of priors beyond merely shifting the maximum likelihood estimate, and the role of marginalization in accounting for uncertainty in nuisance parameters, systematic error, and model misspecification.
Methodology,Instrumentation and Methods for Astrophysics
What problem does this paper attempt to address?
The paper primarily explores the core concepts and methods in Bayesian inference, with a particular emphasis on the importance of the Law of Total Probability (LTP) and its marginalization operations when dealing with complex problems. Specifically, the paper attempts to address the following aspects: 1. **Limitations of Bayes's Theorem**: Although Bayes's theorem (BT) is central to Bayesian inference, the paper points out that focusing solely on Bayes's theorem may overlook more fundamental and critical aspects—namely, the Law of Total Probability. 2. **Role of the Law of Total Probability**: The paper emphasizes the importance of the Law of Total Probability in handling composite hypotheses. Composite hypotheses are those whose truth depends on auxiliary factors (such as nuisance parameters). The paper illustrates through examples how to use the Law of Total Probability to calculate the probabilities of such hypotheses. 3. **Distinction between Probability and Likelihood**: The paper clarifies the difference between probability and likelihood and explains how to use Bayes's theorem to convert likelihood into a posterior distribution that can be meaningfully integrated. 4. **Role of Prior Distribution**: The paper discusses that the prior distribution is not just for shifting the peak of the likelihood function but for converting the likelihood into an integrable probability quantity, which is particularly important for understanding probability distributions in high-dimensional spaces. 5. **Handling Nuisance Parameters**: The paper compares the method of marginalizing nuisance parameters in Bayesian methods with the common optimization methods in non-Bayesian approaches and emphasizes the importance of marginalization for correctly handling the uncertainty of nuisance parameters. 6. **Handling Systematic Errors**: The paper briefly introduces the use of marginalization to describe and propagate systematic errors, especially in cases where standard error propagation methods fail or are not applicable. 7. **Model Comparison and Model Averaging**: The paper also discusses how to use marginalization for model comparison and model averaging to account for model uncertainty. In summary, this paper aims to highlight the importance of the Law of Total Probability and marginalization methods in modern statistics, machine learning, and scientific data analysis, especially when dealing with high-dimensional models. These methods not only better handle uncertainty but also provide more accurate results in complex problems.