Abstract:Building fast and accurate ways to model the distribution of neutral hydrogen during the Epoch of Reionization (EoR) is essential for interpreting upcoming 21 cm observations. A key component of semi-numerical models of reionization is the collapse fraction field $f_{\text{coll}}(\mathbf{x})$, which represents the fraction of mass within dark matter halos at each location. Using high-dynamic range N-body simulations to obtain this is computationally prohibitive and semi-analytical approaches, while being fast, end up compromising on accuracy. In this work, we bridge the gap by developing a machine learning model that can generate $f_{\text{coll}}$ maps by sampling from the full distribution of $f_{\text{coll}}$ conditioned on the dark matter density contrast $\delta$. The conditional distribution functions and the input density field to the model are taken from low-dynamic range N-body simulations that are more efficient to run. We evaluate the performance of our ML model by comparing its predictions to a high-dynamic range N-body simulation. Using these $f_{\text{coll}}$ maps, we compute the HI and HII maps through a semi-numerical code for reionization. We are able to recover the large-scale HI density field power spectra $(k \lesssim 1\ h\,{\rm Mpc}^{-1})$ at the $\lesssim10\%$ level, while the HII density field is reproduced with errors well below 10% across all scales. Compared to existing semi-analytical prescriptions, our approach offers significantly improved accuracy in generating the collapse fraction field, providing a robust and efficient alternative for modeling reionization.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to quickly and accurately predict the neutral hydrogen (HI) density distribution during the epoch of reionization (EoR). Specifically, the main objective of the study is to generate the collapsed fraction field ($f_{\text{coll}}$) from N - body simulations with a low dynamic range by using a machine - learning model based on Gaussian process regression (GPR), thereby avoiding the high computational cost required to run N - body simulations with a high dynamic range while improving the prediction accuracy.
### Background and Problem
In the standard EoR model, it is assumed that galaxies are the main source of ionizing photons, and the reionization process is achieved by forming "ionization bubbles" containing ionized hydrogen (HII). By modeling the distribution of these ionization bubbles, the distribution of neutral hydrogen can be obtained, which in turn provides information about the 21 - centimeter signal fluctuations. The most accurate method is to run radiation - transfer simulations, but these simulations require a large volume to achieve statistical convergence and need to resolve the minimum - mass halos, which results in extremely high computational costs.
### Limitations of Existing Methods
Existing semi - numerical models are fast but sacrifice accuracy. These models usually use the conditional Press - Schechter (PS) or Sheth - Tormen (ST) halo mass functions to approximate the collapsed fraction field $f_{\text{coll}}$, but these analytical mass functions cannot capture all the complexity of halo formation, so their match with the results of N - body simulations is limited.
### The Solution of the Paper
To overcome the above limitations, this paper proposes a new method:
1. **Data Preparation**: Use N - body simulation data with a low dynamic range to train a machine - learning model, which has a lower computational cost.
2. **Model Training**: Adopt the Gaussian process regression (GPR) technique to construct an interpolation function that can generate the conditional cumulative distribution function (CDF) of the collapsed fraction field $f_{\text{coll}}$ according to the dark matter density contrast $\delta$.
3. **Prediction and Validation**: Use the trained model to predict the collapsed fraction field in N - body simulations with a high dynamic range, and evaluate the model performance by comparing the prediction results with the real results of the high - dynamic - range simulations.
### Main Contributions
- **Improved Accuracy**: Compared with existing semi - analytical methods, this model provides significantly higher accuracy in generating the collapsed fraction field.
- **Efficiency Improvement**: By using N - body simulation data with a low dynamic range, the computational cost is greatly reduced, making large - scale parameter - space exploration possible.
- **Comprehensive Consideration of Randomness**: The model not only considers the conditional mean $\langle f_{\text{coll}}|\delta\rangle$, but also considers the random fluctuations of the collapsed fraction, thus more accurately recovering small - scale features.
### Conclusion
This paper successfully establishes a machine - learning framework that can efficiently and accurately model the physical fields related to EoR without running N - body simulations with a high dynamic range. This method provides a powerful tool for future research and helps to better understand the physical processes during the epoch of cosmic reionization.