Abstract:Upcoming surveys such as Euclid, the Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST) and the Nancy Grace Roman Telescope (Roman) will detect hundreds of high-redshift (z > 7) quasars, but distinguishing them from the billions of other sources in these catalogues represents a significant data analysis challenge. We address this problem by extending existing selection methods by using both i) Bayesian model comparison on measured fluxes and ii) a likelihood-based goodness-of-fit test on images, which are then combined using an Fbeta statistic. The result is an automated, reproduceable and objective high-redshift quasar selection pipeline. We test this on both simulations and real data from the cross-matched Sloan Digital Sky Survey (SDSS) and UKIRT Infrared Deep Sky Survey (UKIDSS) catalogues. On this cross-matched dataset we achieve an AUC score of up to 0.795 and an F3 score of up to 0.79, sufficient to be applied to the Euclid, LSST and Roman data when available.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to automatically identify the most distant quasars (high - redshift quasars, \(z\gtrsim7\)) in the upcoming wide - area sky surveys (such as Euclid, LSST and Roman). Specifically, the paper aims to develop an automated method that can reliably distinguish these extremely rare high - redshift quasars from hundreds of millions of celestial sources. ### Main problems: 1. **Huge amount of data**: Future sky surveys will generate vast amounts of data, which contain billions of celestial sources. How to efficiently screen out a small number of high - redshift quasars is a huge challenge. 2. **Rarity of quasars**: High - redshift quasars are very rare. For example, in each square degree of the sky, there are only about \(2.5\times 10^{- 2}\) quasars with redshift greater than 7 (\(J = 23\)). Therefore, traditional simple methods such as color - cutting are difficult to effectively identify these rare targets. 3. **Background noise and interference**: In addition to rarity, high - redshift quasars also face serious interference from other celestial bodies (such as M/L/T/Y dwarfs and early - type galaxies) and non - astronomical artifacts. The number of these interference sources far exceeds that of the target quasars, increasing the difficulty of identification. ### Solutions: To address the above challenges, the paper proposes an automated selection method that combines Bayesian model comparison and image goodness - of - fit testing. The specific steps are as follows: 1. **Bayesian model comparison**: Analyze the photometric data through Bayesian statistical methods and calculate the probability that each source is a quasar. This step utilizes multi - band photometric data and takes into account the color characteristics of different celestial populations. 2. **Image goodness - of - fit testing**: Conduct pixel - level analysis on the image data of each candidate source and evaluate its goodness - of - fit with the quasar model. This step can further exclude sources that do not conform to the quasar characteristics morphologically. 3. **\(F_{\beta}\) statistic**: Combine the results of the above two methods and use the \(F_{\beta}\) statistic to define the final selection threshold. The \(F_{\beta}\) statistic can balance between precision and recall, ensuring that the scientific value is maximized under limited observation resources. ### Testing and verification: To verify the effectiveness of this method, the authors tested it on simulated data and the actual SDSS - UKIDSS cross - matched data set. The results show that this method achieved an AUC score of up to 0.795 and an \(F_{3}\) score of 0.79 on the cross - matched data set, indicating its high accuracy and reliability and its suitability for future large - scale sky survey data. ### Formula explanation: - **\(F_{\beta}\) statistic**: \[F_{\beta}=\frac{(1 + \beta^{2})\cdot\text{precision}\cdot\text{recall}}{\beta^{2}\cdot\text{precision}+\text{recall}}\] - When \(\beta = 1\), \(F_{1}\) is the harmonic mean of precision and recall. - When \(\beta>1\), more emphasis is placed on recall. Through this method, the paper provides a systematic and automated high - redshift quasar selection pipeline, providing strong support for future wide - area sky surveys.

An automated method for finding the most distant quasars

Quasar Photometric Redshifts and Candidate Selection: A New Algorithm Based on Optical and Mid-infrared Photometric Data

Efficient Selection of Quasar Candidates Based on Optical and Infrared Photometric Data Using Machine Learning

Discovering the missing 2.2 < z < 3 quasars by combining optical variability and optical/near-infrared colors

Machine Learning-based Search of High-redshift Quasars

The Extremely Luminous Quasar Survey in the SDSS Footprint. I. Infrared-based Candidate Selection

DISCOVERING BRIGHT QUASARS AT INTERMEDIATE REDSHIFTS BASED ON OPTICAL/NEAR-INFRARED COLORS

Detecting Quasars in Large-Scale Astronomical Surveys

The Extremely Luminous Quasar Survey (ELQS) in the SDSS footprint I.: Infrared Based Candidate Selection

Predicting the Yields of $z$ > 6.5 Quasar Surveys in the Era of Roman and Rubin

Optimal Time-Series Selection of Quasars

FINDING MISSING QUASARS IN THE 'REDSHIFT DESERT'

Evaluating and Improving the Redshifts of z>2.2 Quasars

Measuring photometric redshifts for high-redshift radio source surveys

A new binning method to choose a standard set of Quasars

Astrometric Redshifts for Quasars

Color-Redshift Relations And Photometric Redshift Estimations Of Quasars In Large Sky Surveys

Bayesian High-Redshift Quasar Classification from Optical and Mid-Ir Photometry

A Survey of Luminous High-Redshift Quasars with SDSS and WISE. I. Target Selection and Optical Spectroscopy

Spectroscopy of QUBRICS quasar candidates: 1672 new redshifts and a Golden Sample for the Sandage Test of the Redshift Drift

Quasar Island -- Three new $z\sim6$ quasars, including a lensed candidate, identified with contrastive learning