Convolutional neural network for Lyman break galaxies classification and redshift regression in DESI (Dark Energy Spectroscopic Instrument)

Julien Taran
2024-06-24
Abstract:DESI is a groundbreaking international project to observe more than 40 million quasars and galaxies over a 5-year period to create a 3D map of the sky. This map will enable us to probe multiple aspects of cosmology, from dark energy to neutrino mass. We are focusing here on one type of object observed by DESI, the Lyman Break Galaxies (LBGs). The aim is to use their spectra to determine whether they are indeed LBGs, and if so, to determine their distance from the Earth using a phenomenon called redshift. This will enable us to place these galaxies on the DESI 3D map. The aim is therefore to develop a convolutional neural network (CNN) inspired by QuasarNET (See <a class="link-https" data-arxiv-id="1808.09955" href="https://arxiv.org/abs/1808.09955">arXiv:1808.09955</a>), performing simultaneously a classification (LBG type or not) and a regression task (determine the redshift of the LBGs). Initially, data augmentation techniques such as shifting the spectra in wavelengths, adding noise to the spectra, or adding synthetic spectra were used to increase the model training dataset from 3,019 data to over 66,000. In a second phase, modifications to the QuasarNET architecture, notably through transfer learning and hyperparameter tuning with Bayesian optimization, boosted model performance. Gains of up to 26% were achieved on the Purity/Efficiency curve, which is used to evaluate model performance, particularly in areas with interesting redshifts, at low (around 2) and high (around 4) redshifts. The best model obtained an average score of 94%, compared with 75% for the initial model.
Cosmology and Nongalactic Astrophysics,Artificial Intelligence
What problem does this paper attempt to address?
Based on the provided text content, the main problems that this paper attempts to solve are as follows: 1. **Classification task**: Determine whether the observed object is a Lyman Break Galaxy (LBG) through spectral data. This involves screening out the real LBGs from a large amount of observational data and excluding other types of celestial bodies (such as quasars QSO and emission - line galaxies ELG). 2. **Regression task**: For the objects confirmed as LBGs, determine their redshift values, thereby inferring the distances between these galaxies and the Earth. The accurate determination of redshift values is crucial for constructing the three - dimensional sky map of DESI (Dark Energy Spectroscopic Instrument). Specifically, the goal of the paper is to develop a convolutional neural network (CNN) that can perform both of the above - mentioned tasks simultaneously. To improve the performance of the model, the author adopts a variety of techniques, including data augmentation (such as changing the wavelength of the spectrum, adding noise, synthesizing spectra, etc.), transfer learning, and hyperparameter optimization. Through these improvements, the performance of the model in the low - redshift (about 2) and high - redshift (about 4) regions has been significantly improved, especially with a 26% improvement in the performance on the purity/efficiency curve. The average score of the final model reaches 94%, which is a significant improvement compared to 75% of the initial model. In conclusion, this paper aims to improve the accuracy of the identification of Lyman Break Galaxies and the determination of redshift through deep - learning techniques, in order to support the DESI project to construct a more accurate three - dimensional cosmic map.