Classification of compact radio sources in the Galactic plane with supervised machine learning

S. Riggi,G. Umana,C. Trigilio,C. Bordiu,F. Bufano,A. Ingallinera,F. Cavallaro,Y. Gordon,R. P. Norris,G. Gürkan,P. Leto,C. Buemi,S. Loru,A. M. Hopkins,M. D. Filipović,T. Cecconello
2024-02-23
Abstract:Generation of science-ready data from processed data products is one of the major challenges in next-generation radio continuum surveys with the Square Kilometre Array (SKA) and its precursors, due to the expected data volume and the need to achieve a high degree of automated processing. Source extraction, characterization, and classification are the major stages involved in this process. In this work we focus on the classification of compact radio sources in the Galactic plane using both radio and infrared images as inputs. To this aim, we produced a curated dataset of ~20,000 images of compact sources of different astronomical classes, obtained from past radio and infrared surveys, and novel radio data from pilot surveys carried out with the Australian SKA Pathfinder (ASKAP). Radio spectral index information was also obtained for a subset of the data. We then trained two different classifiers on the produced dataset. The first model uses gradient-boosted decision trees and is trained on a set of pre-computed features derived from the data, which include radio-infrared colour indices and the radio spectral index. The second model is trained directly on multi-channel images, employing convolutional neural networks. Using a completely supervised procedure, we obtained a high classification accuracy (F1-score>90%) for separating Galactic objects from the extragalactic background. Individual class discrimination performances, ranging from 60% to 75%, increased by 10% when adding far-infrared and spectral index information, with extragalactic objects, PNe and HII regions identified with higher accuracies. The implemented tools and trained models were publicly released, and made available to the radioastronomical community for future application on new radio data.
Instrumentation and Methods for Astrophysics,Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? The main objective of this paper is to solve the classification problem of compact radio sources in the Galactic plane, especially the challenge of automatically generating science - ready data in the next - generation radio continuum surveys (such as the Square Kilometre Array, SKA). Specifically, the authors focus on how to use supervised machine - learning methods to classify compact radio sources in the Galactic plane. #### Main problems and challenges 1. **Requirement for automated processing**: - With the dramatic increase in the amount of data from SKA and its precursor telescopes, a highly automated data - processing flow is required to generate science - ready data. - Source extraction, characterization, and classification are key steps in this process. 2. **Utilization of multi - band data**: - The study uses radio and infrared images as input, combining data from multiple past and latest radio and infrared surveys. - In particular, far - infrared and radio spectral index information is introduced to improve classification performance. 3. **High - precision classification**: - The paper aims to achieve high - precision source classification by training two different classifiers: - The first model is based on Gradient Boosting Decision Trees (GBDT) and is trained using pre - calculated features from the data (such as radio - infrared color index and radio spectral index). - The second model is directly trained on multi - channel images using a Convolutional Neural Network (CNN). 4. **Classification accuracy**: - Through a fully - supervised method, a high classification accuracy (F1 - score > 90%) in separating Galactic objects from extragalactic backgrounds is achieved. - The recognition performance for individual categories is also improved, especially when far - infrared and spectral index information is added, the recognition accuracy of extragalactic objects, planetary nebulae (PNe), and H II regions is significantly increased. 5. **Public release of tools and models**: - The developed tools and trained models have been publicly released for the radio astronomy community to apply to new radio data in the future. ### Formula representation The radio spectral index \(\alpha\) mentioned in the paper can be represented by the following formula: \[ S_\nu \propto \nu^\alpha \] where \(S_\nu\) is the radio flux density at frequency \(\nu\), and \(\alpha\) is the spectral index. ### Summary This paper solves the key problems of classifying compact radio sources in the Galactic plane, especially the need to automatically generate science - ready data in large - scale radio surveys. By combining multi - band data and advanced machine - learning techniques, the authors have successfully improved the accuracy and efficiency of classification and provided valuable tools and resources for future radio astronomy research.