Bayesian calibration of stochastic agent based model via random forest

Connor Robertson,Cosmin Safta,Nicholson Collier,Jonathan Ozik,Jaideep Ray
2024-06-28
Abstract:Agent-based models (ABM) provide an excellent framework for modeling outbreaks and interventions in epidemiology by explicitly accounting for diverse individual interactions and environments. However, these models are usually stochastic and highly parametrized, requiring precise calibration for predictive performance. When considering realistic numbers of agents and properly accounting for stochasticity, this high dimensional calibration can be computationally prohibitive. This paper presents a random forest based surrogate modeling technique to accelerate the evaluation of ABMs and demonstrates its use to calibrate an epidemiological ABM named CityCOVID via Markov chain Monte Carlo (MCMC). The technique is first outlined in the context of CityCOVID's quantities of interest, namely hospitalizations and deaths, by exploring dimensionality reduction via temporal decomposition with principal component analysis (PCA) and via sensitivity analysis. The calibration problem is then presented and samples are generated to best match COVID-19 hospitalization and death numbers in Chicago from March to June in 2020. These results are compared with previous approximate Bayesian calibration (IMABC) results and their predictive performance is analyzed showing improved performance with a reduction in computation.
Machine Learning,Applications
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to calibrate agent - based models (ABM) efficiently and accurately, especially for epidemiological models with high randomness and multi - parameter characteristics. Specifically, the paper focuses on the following aspects: 1. **Computational efficiency**: Traditional agent - based models (such as CityCOVID) usually require a large amount of computational resources when performing parameter calibration due to their high - dimensionality and randomness. This makes accurate calibration very time - consuming and expensive. 2. **Accuracy of parameter calibration**: In order to improve the prediction performance, accurate parameter calibration of the model is required. However, due to the high randomness and complexity of ABM, it is often difficult to achieve calibration directly using traditional methods. 3. **Handling randomness**: The randomness in the ABM model increases the difficulty of calibration. How to ensure the accuracy of the calibration results while maintaining the randomness of the model is a key challenge. To solve these problems, the paper proposes a global surrogate model method based on random forests, combined with Bayesian calibration techniques (through the Markov chain Monte Carlo method, i.e., MCMC) to accelerate the evaluation and calibration of ABM. The specific steps are as follows: - **Dimensionality reduction**: Perform time - decomposition on the target quantities (such as the number of hospitalizations and the number of deaths) through principal component analysis (PCA) to reduce the data dimension. - **Sensitivity analysis**: Use the sensitivity measures built into the random forest (such as Gini importance, permutation importance, and Sobol index) to identify the parameters that have the greatest impact on the output, thereby further reducing the dimension of the parameter space. - **Surrogate model construction**: Train a random forest regression model to map the reduced - dimension parameters to the output coefficients, thereby establishing an efficient surrogate model. - **Bayesian calibration**: Use the MCMC method to estimate the posterior distribution of ABM parameters based on the samples generated by the surrogate model to achieve efficient and accurate parameter calibration. Through this method, the paper has successfully reduced the computational burden, improved the calibration accuracy, and demonstrated superior performance in simulating hospitalization and death data during the COVID - 19 epidemic in the Chicago area in 2020. Compared with the previous approximate Bayesian calibration (IMABC) method, the new method not only improves the prediction performance but also significantly reduces the computation time. ### Formula summary 1. **PCA decomposition**: \[ \begin{bmatrix} h_{1,1} & \cdots & h_{1,n} \\ h_{2,1} & \cdots & h_{2,n} \\ \vdots & \ddots & \vdots \\ h_{m,1} & \cdots & h_{m,n} \end{bmatrix} \rightarrow \begin{bmatrix} \alpha_{1,1} \\ \alpha_{2,1} \\ \vdots \\ \alpha_{m,1} \end{bmatrix} \odot \vec{c}_1 + \cdots + \begin{bmatrix} \alpha_{1,2n} \\ \alpha_{2,2n} \\ \vdots \\ \alpha_{m,2n} \end{bmatrix} \odot \vec{c}_{2n} \] where \(h_{i,j}\) and \(d_{i,j}\) are the number of hospitalizations and the number of deaths of the \(i\)-th group of parameters at the \(j\)-th time step respectively, \(\vec{c}_j\in\mathbb{R}^{2n}\) is the PCA component, and \(\alpha_{i,j}\) is the coefficient multiplied by the component \(\vec{c}_j\). 2. **Bayesian likelihood function**: \[ L(\vec{h}^\circ, \vec{d}^\circ | \vec{\theta})=\frac{1}{(2\pi)^{n/2}\sigma_h^n} \