A Survey of Bayesian Statistical Approaches for Big Data

Farzana Jahan,Insha Ullah,Kerrie L Mengersen
DOI: https://doi.org/10.1007/978-3-030-42553-1
2020-06-08
Abstract:The modern era is characterised as an era of information or Big Data. This has motivated a huge literature on new methods for extracting information and insights from these data. A natural question is how these approaches differ from those that were available prior to the advent of Big Data. We present a review of published studies that present Bayesian statistical approaches specifically for Big Data and discuss the reported and perceived benefits of these approaches. We conclude by addressing the question of whether focusing only on improving computational algorithms and infrastructure will be enough to face the challenges of Big Data.
Computation,Other Statistics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to use Bayesian statistical methods to meet the challenges brought by big data**. Specifically, the paper focuses on the application and advantages of Bayesian statistical methods in the context of big data, and explores whether improving only computational algorithms and infrastructure is sufficient to meet the challenges of big data. ### Main problems and objectives of the paper 1. **Application of Bayesian statistical methods in big data**: - The paper reviews existing research that has proposed Bayesian statistical models specifically for big data and discusses the advantages of these methods. - The authors summarize the specific innovation points of Bayesian methods in handling big data, including contributions in aspects such as modeling and algorithms. 2. **Challenges of big data**: - Big data has the "4V" characteristics: Volume (large amount), Variety (diversity), Velocity (high - speed), Veracity (authenticity). In addition, the authenticity and noise problems of data (Veracity) are also mentioned, which may be one of the biggest challenges in big data analysis. - The paper explores the complexity of big data management, modeling, analysis, and interpretation, and points out that traditional analysis tools are often powerless in the face of big data. 3. **Is the improvement of computational algorithms and infrastructure sufficient?**: - The paper finally discusses a key question: Is it sufficient to meet the challenges of big data only by improving computational algorithms and infrastructure? The authors believe that although these improvements are necessary, they may not be enough, and new statistical methods and theories need to be combined to better handle big data. ### Formula examples When discussing Bayesian statistical methods, some formulas may be involved. For example, Bayes' theorem can be expressed as: \[ P(\theta | D)=\frac{P(D | \theta)P(\theta)}{P(D)} \] where: - \(P(\theta | D)\) is the posterior probability, that is, the probability distribution of parameter \(\theta\) after observing data \(D\). - \(P(D | \theta)\) is the likelihood function, that is, the probability of observing data \(D\) given parameter \(\theta\). - \(P(\theta)\) is the prior probability, that is, the assumption of parameter \(\theta\) before observing data. - \(P(D)\) is the marginal likelihood or evidence, that is, the total probability of observing data \(D\). Through these formulas, Bayesian methods can provide more flexible and powerful statistical inference tools in the context of big data. ### Summary The core problem of this paper is to explore the application and advantages of Bayesian statistical methods in big data analysis, and to evaluate whether relying solely on the improvement of computational algorithms and infrastructure is sufficient to meet the challenges of big data.