Orthogonal Bootstrap: Efficient Simulation of Input Uncertainty

Kaizhao Liu,Jose Blanchet,Lexing Ying,Yiping Lu
2024-05-01
Abstract:Bootstrap is a popular methodology for simulating input uncertainty. However, it can be computationally expensive when the number of samples is large. We propose a new approach called \textbf{Orthogonal Bootstrap} that reduces the number of required Monte Carlo replications. We decomposes the target being simulated into two parts: the \textit{non-orthogonal part} which has a closed-form result known as Infinitesimal Jackknife and the \textit{orthogonal part} which is easier to be simulated. We theoretically and numerically show that Orthogonal Bootstrap significantly reduces the computational cost of Bootstrap while improving empirical accuracy and maintaining the same width of the constructed interval.
Methodology,Machine Learning,Econometrics,Statistics Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the high computational cost when using the Bootstrap method for input uncertainty simulation on large - scale data sets. Specifically, the Bootstrap method estimates statistical uncertainty through resampling. However, when the number of samples is large, a large number of Monte Carlo repetitions are required, which leads to a huge computational overhead. To solve this problem, the paper proposes a new method named Orthogonal Bootstrap, aiming to reduce the required number of Monte Carlo repetitions, thereby reducing the computational cost, while improving the empirical accuracy and keeping the width of the constructed interval unchanged. ### Key points: 1. **Problem background**: - **Input uncertainty**: In data - driven analysis, the statistical noise propagated from the data model to the subsequent output analysis affects the accuracy and reliability of the results. - **Bootstrap method**: Bootstrap is a non - parametric method that estimates this uncertainty through random resampling with replacement. However, when the number of samples is large, the computational cost of the Bootstrap method is very high. 2. **Proposed method**: - **Orthogonal Bootstrap**: This method reduces the computational cost by decomposing the target into two parts: - **Non - orthogonal part**: This part has a closed - form solution, called Infinitesimal Jackknife. - **Orthogonal part**: This part is easier to simulate. - By dealing with these two parts separately, Orthogonal Bootstrap significantly reduces the computational cost while improving the empirical accuracy. 3. **Theoretical and empirical results**: - **Theoretical results**: The paper proves that under the assumption that the performance metric has a continuous Fréchet derivative under the Kernel Maximum Mean Discrepancy (MMD) distance, Orthogonal Bootstrap can reduce the required number of Monte Carlo repetitions from \( \Omega(n) \) to \( O(1) \). - **Empirical results**: The paper shows significant improvements of Orthogonal Bootstrap on simulated and real - world data sets through numerical experiments, especially when the number of Monte Carlo repetitions is limited. 4. **Comparison with existing methods**: - **Standard Bootstrap**: When the number of Monte Carlo repetitions is small, the coverage probability of the standard Bootstrap method is significantly lower than expected. - **Cheap Bootstrap**: Although Cheap Bootstrap can provide a similar coverage probability, the average width of its confidence interval is longer. - **Orthogonal Bootstrap**: When the number of Monte Carlo repetitions is limited, Orthogonal Bootstrap not only provides a higher coverage probability but also achieves the same confidence interval width as the standard Bootstrap. ### Summary: The paper proposes a new Bootstrap method - Orthogonal Bootstrap, aiming to solve the problem of high computational cost of the Bootstrap method on large - scale data sets. By decomposing the target into non - orthogonal and orthogonal parts, this method reduces the computational cost while improving the accuracy and reliability of the estimates.