Spectroscopic Quasar Anomaly Detection (SQuAD) I: Rest-Frame UV Spectra from SDSS DR16

Arihant Tiwari,M. Vivek
2024-11-26
Abstract:We present the results of applying anomaly detection algorithms to a quasar spectroscopic sub-sample from the SDSS DR16 Quasar Catalog, covering the redshift range 1.88 < z < 2.47. Principal Component Analysis (PCA) was employed for dimensionality reduction of the quasar spectra, followed by hierarchical K-Means clustering in a 20-dimensional PCA eigenvector hyperspace. To prevent broad absorption line (BAL) quasars from being identified as the primary anomaly group, we conducted the analysis with and without them, comparing both datasets for a clearer identification of other anomalous quasar types. We identified 1,888 anomalous quasars, categorized into 10 broad groups. The anomalous groups include C IV Peakers-quasars with extremely strong and narrow C IV emission lines; Excess Si IV emitters-quasars where the Si IV line is as strong as the C IV line; and Si IV Deficient anomalies, which exhibit significantly weaker Si IV emission compared to typical quasars. The anomalous nature of these quasars is attributed to lower Eddington ratios for C IV Peakers, super-solar metallicity for Excess Si IV emitters, and sub-solar metallicity for Si IV Deficient anomalies. Additionally, we identified four groups of BAL anomalies: Blue BALs, Flat BALs, Reddened BALs, and FeLoBALs, distinguished primarily by the strength of reddening in these sources. Further, among the non-BAL quasars, we identified three types of reddened anomaly groups classified as heavily reddened, moderately reddened, and plateau-shaped spectrum quasars, each exhibiting varying degrees of reddening. The detected anomalies are presented as a value-added catalog.
Astrophysics of Galaxies
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to identify anomalous quasars in large - scale spectroscopic survey data. Specifically, the researchers applied an anomaly detection algorithm to analyze a sub - sample from the SDSS DR16 quasar catalog, with these quasars having a redshift range between 1.88 and 2.47. By using principal component analysis (PCA) for dimensionality reduction and employing a hierarchical K - means clustering method in the 20 - dimensional PCA feature - vector hyperspace, the researchers aimed to: 1. **Identify anomalous quasars**: Identify anomalous quasars that are different from the majority of quasars through the algorithm. 2. **Classify anomaly types**: Further classify the identified anomalous quasars in order to better understand the causes of these anomalies. 3. **Exclude the influence of broad absorption line (BAL) quasars**: To prevent broad absorption line quasars from being misidentified as the main anomalous group, the researchers analyzed data sets with and without broad absorption line quasars respectively, and compared the results of the two data sets to more clearly identify other types of anomalous quasars. ### Main findings - **1,888 anomalous quasars identified**: These anomalous quasars are divided into 10 main categories. - **Types of anomalous quasars**: - **C iv Peakers**: Quasars with extremely strong and narrow C iv emission lines. - **Excess Si iv Emitters**: Quasars with Si iv line intensities comparable to C iv lines. - **Si iv Deficient Anomalies**: Quasars with Si iv emission lines significantly weaker than typical quasars. - **Four BAL anomalies**: Blue BAL, Flat BAL, Reddened BAL and FeLoBAL, mainly distinguished according to the reddening degree of these sources. - **Three non - BAL reddening anomalies**: Heavily reddened, moderately reddened and plateau - shaped spectral type quasars, each showing different degrees of reddening. ### Methods 1. **Data pre - processing**: - **Normalization**: Normalize the spectra using the maximum flux value, adjusting the flux value range of all spectra to [- 1, 1]. - **Smoothing**: Smooth the normalized spectra using the Savitzky - Golay filter. - **Resampling**: Resample the smoothed and normalized spectra onto a common wavelength grid. - **Padding**: Pad the flux values outside the wavelength range to ensure that all spectra have a unified dimension. 2. **Principal component analysis (PCA)**: - Use PCA to reduce the high - dimensional spectral data to a 20 - dimensional feature - vector hyperspace, retaining 92.1% of the total variance. 3. **K - means clustering**: - Apply the K - means clustering algorithm to cluster the data after dimensionality reduction, and determine that the optimal number of clusters is 3. - Calculate the Euclidean distance between each point and the center of the cluster to which it belongs, and use 5σ and 4σ thresholds for anomaly detection of the first two clusters and the third cluster respectively. ### Conclusion Through the above methods, the researchers successfully identified and classified a variety of anomalous quasars. These anomalous quasars may reveal some special physical processes or environmental factors in the quasar environment, which is helpful for a more in - depth understanding of the diversity and physical properties of quasars.