Periodic Variable Star Classification with Deep Learning: Handling Data Imbalance in an Ensemble Augmentation Way

Zihan Kang,Yanxia Zhang,Jingyi Zhang,Changhua Li,Minzhi Kong,Yongheng Zhao,Xue-Bing Wu
DOI: https://doi.org/10.1088/1538-3873/acf15e
2023-09-24
Abstract:Time-domain astronomy is progressing rapidly with the ongoing and upcoming large-scale photometric sky surveys led by the Vera C. Rubin Observatory project (LSST). Billions of variable sources call for better automatic classification algorithms for light curves. Among them, periodic variable stars are frequently studied. Different categories of periodic variable stars have a high degree of class imbalance and pose a challenge to algorithms including deep learning methods. We design two kinds of architectures of neural networks for the classification of periodic variable stars in the Catalina Survey's Data Release 2: a multi-input recurrent neural network (RNN) and a compound network combing the RNN and the convolutional neural network (CNN). To deal with class imbalance, we apply Gaussian Process to generate synthetic light curves with artificial uncertainties for data augmentation. For better performance, we organize the augmentation and training process in a "bagging-like" ensemble learning scheme. The experimental results show that the better approach is the compound network combing RNN and CNN, which reaches the best result of 86.2% on the overall balanced accuracy and 0.75 on the macro F1 score. We develop the ensemble augmentation method to solve the data imbalance when classifying variable stars and prove the effectiveness of combining different representations of light curves in a single model. The proposed methods would help build better classification algorithms of periodic time series data for future sky surveys (e.g., LSST).
Instrumentation and Methods for Astrophysics,Solar and Stellar Astrophysics
What problem does this paper attempt to address?
The paper primarily addresses the issue of classifying periodic variable stars in astronomy, particularly how to handle the impact of data imbalance on deep learning algorithms. Specifically, the main challenges faced by the researchers include: 1. **Data Imbalance Problem**: The number of samples for different types of periodic variable stars varies greatly, leading to machine learning models that tend to favor the majority class while ignoring the minority class. 2. **Application of Deep Learning Methods**: Although traditional machine learning methods have many techniques for handling imbalanced data, such techniques are still immature in the field of deep learning, especially when dealing with light curve data in astrophysics. To address the above issues, the authors proposed the following solutions: - Designed two neural network architectures: a multi-input Recurrent Neural Network (RNN) and a composite network combining RNN and Convolutional Neural Network (CNN) to utilize different types of input information for classification. - Used Gaussian processes to generate synthetic light curves to increase the amount of data for the minority class while preserving uncertainty information. - Implemented an ensemble learning method based on the "bagging" concept, by constructing multiple sub-datasets and training different neural network models, then averaging the results of these models to improve overall classification performance and mitigate overfitting issues. Through experimental evaluation, the authors found that the composite network combining RNN and CNN achieved an overall balanced accuracy of 86.2% and a macro F1 score of 0.75, showing significant improvement compared to using only RNN or other data augmentation methods. Additionally, this approach demonstrated the effectiveness of combining light curves in different representations within a single model, providing strong support for the classification of periodic time series data in future large-scale sky survey projects (such as LSST).