Semi-supervised Spectral Classification of DESI White Dwarfs by Dimensionality Reduction

Xander Byrne,Amy Bonsor,Laura K. Rogers,Christopher J. Manser
2024-10-30
Abstract:As a new generation of large-sky spectroscopic surveys comes online, the enormous data volume poses unprecedented challenges in classifying spectra. Modern unsupervised techniques have the power to group spectra based on their dominant features, circumventing the complete reliance on training data suffered by supervised methods. We outline the use of dimensionality reduction to generate a 2D map of the structure of an intermediate-resolution spectroscopic dataset. This technique efficiently separates white dwarfs of different spectral classes in the Dark Energy Spectroscopic Instrument's Early Data Release (DESI EDR), identifying spectral features that had been missed even by visual classification. By focusing the method on particular spectral regions, we identify white dwarfs with helium features at 90 per cent recall, and cataclysmic variables at 100 per cent recall, illustrating rapid selection of low-contamination samples from spectroscopic surveys. We also demonstrate the use of dimensionality reduction in a supervised manner, outlining a procedure to classify any white dwarf spectrum in comparison with those in the DESI EDR. With upcoming surveys promising tens of millions of spectra, our work highlights the potential for semi-supervised techniques as an efficient means of classification and dataset visualisation.
Instrumentation and Methods for Astrophysics,Solar and Stellar Astrophysics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: with the launch of a new generation of large - scale astrospectroscopic sky - survey projects, the massive amount of spectral data has brought an unprecedented challenge to the classification of white dwarfs (White Dwarfs, WDs). Traditional supervised learning methods rely on training data sets, and these data sets often have the problem of class imbalance and require a large amount of expert time for annotation. Therefore, this paper proposes a semi - supervised spectral classification method based on dimensionality reduction techniques to efficiently classify the white dwarf spectra in the DESI Early Data Release (DESI EDR). ### Specific problems and solutions: 1. **Classification challenges brought by massive data**: - **Problem**: The amount of data generated by modern large - scale spectroscopic sky - survey projects (such as DESI, 4MOST, etc.) is huge, and traditional manual classification methods cannot cope with it. - **Solution**: Use dimensionality reduction techniques (such as t - SNE) to generate two - dimensional maps, mapping high - dimensional spectral data into low - dimensional space, thereby achieving efficient classification and visualization. 2. **Class imbalance problem**: - **Problem**: The number of some types of white dwarfs (such as DA - type) is far greater than that of other types (such as DO - type), resulting in class imbalance in the training data set. - **Solution**: Adopt unsupervised or semi - supervised methods to avoid complete reliance on the training data set, thereby reducing the impact of class imbalance. 3. **Requirement for automated classification**: - **Problem**: The amount of data in future spectroscopic sky - survey projects (for example, DESI is expected to have about 70,000 white dwarf candidates) will increase significantly, and manual classification is difficult to meet the demand. - **Solution**: Develop an automated classification method, use dimensionality reduction techniques to quickly screen out white dwarf samples of specific types, and improve classification efficiency. ### Method overview: - **Data set**: Use the spectral data of 3,673 white dwarf candidates in DESI EDR. - **Dimensionality reduction technique**: Apply t - SNE to reduce the high - dimensional spectral data to two - dimensional space for easy visualization and analysis. - **Pre - processing**: Normalize the spectral data, remove noise and outliers to ensure data quality. - **Classification result**: Through the two - dimensional map after dimensionality reduction, different types of white dwarfs (such as DA, DB, DZ, etc.) are successfully identified, and their distribution rules in effective temperature and main spectral types are shown. ### Highlights of results: - **Efficient classification**: It can process a large amount of spectral data in a short time and accurately identify different types of white dwarfs. - **Reveal implicit structures**: Discover potential structures in the spectral data, such as V - shaped sequences and secondary sequences, which are closely related to the effective temperature and spectral type of white dwarfs. - **Application prospects**: This method is not only applicable to white dwarf classification, but can also be extended to the spectral analysis of other celestial bodies (such as main - sequence stars, quasars, etc.). In conclusion, this paper provides an efficient and reliable white dwarf spectral classification method by introducing dimensionality reduction techniques, providing important technical support for future astrospectroscopic sky - survey projects.