Searching for Lyman Limit Systems in Dark Energy Spectroscopic Instrument Mock Spectra Using Convolutional Neural Network

Qiao Pu,Han Liu,Jiaqi Zou,Zheng Cai
DOI: https://doi.org/10.1360/tb-2024-0177
2024-01-01
Abstract:Studying Lyman limit systems (LLS) is crucial for a deeper understanding of the large-scale structure of the universe, the evolutionary history of galaxies, and the distribution of gas within galaxy clusters. Although LLS absorption features are distinctive, current research is largely constrained by these characteristics. Additionally, traditional methods are predominantly employed, with a primary focus on the identification and analysis of small sample sets with column densities ranging from 10(19) cm(-2)<= N(HI)<10(20.3) cm(-2). The objective of this study is to surpass the constraints of current research by utilizing deep learning methods to investigate a wider and more inclusive sample. This approach facilitates the detection and characterization of LLS with reduced column densities. We utilized high-quality spectral data simulated by the Dark Energy Spectroscopic Instrument (DESI) as the experimental foundation. Through the optimization of convolutional neural network (CNN) models, we have effectively boosted the model's identification accuracy of LLS (with column densities of 10(18.5) cm(-2)<= N(HI)<= 10(20.0) cm(-2)) in DESI simulated spectra to 95%. Following that, this paper validated the completeness and purity of the model under different signal-to-noise ratios and column density conditions. Additionally, an analysis of the differences between the CNN model's estimated and actual values of column density and redshift was conducted. The analysis results indicate that, under conditions where the signal-to-noise ratio exceeds 6, for LLS with column densities of 10(19.0) cm(-2)>N(HI)>10(18.5) cm(-2), the completeness of the CNN model exceeds 0.5, and the purity exceeds 0.2. For LLS with column densities of 10(20.0) cm(-2)>N(HI)>10(19.0) cm(-2), the model's completeness exceeds 0.9, and the purity exceeds 0.7. Further analysis reveals that as column density and signal-to-noise ratio increase, both the completeness and purity of the model exhibit an upward trend. In the comparison between the estimated and actual values of LLS column density and redshift by the CNN model, we found that within the range of 10(20.0) cm(-2)>N(HI)>10(18.5) cm(-2), the average difference between the model's estimated and actual values of LLS column density is -0.05161 with a standard deviation of 0.239. Similarly, the average difference between the estimated and actual values of LLS redshift is -0.0003 with a standard deviation of 0.0009. These findings indicate that, although the model's completeness generally surpasses its purity, particularly in low column density regions, the absorption features of LLS are relatively weak and prone to confusion with other spectral bands, resulting in a higher number of false positive (FP) samples. And as the column density and signal-to-noise ratio continuously increase, both the completeness and purity also increase accordingly. At the same time, the CNN model tends to underestimate the column density and redshift of LLS, yet the distribution of estimation errors is relatively concentrated, demonstrating the model's robustness. This study not only provides a novel analytical approach for future LLS research but also encourages researchers to adopt and adapt CNN models for a broader spectrum analysis, thus paving the way for new avenues in cosmological research.
What problem does this paper attempt to address?