Applying machine learning to Galactic Archaeology: how well can we recover the origin of stars in Milky Way-like galaxies?

Andrea Sante,Andreea S. Font,Sandra Ortega-Martorell,Ivan Olier,Ian G. McCarthy
DOI: https://doi.org/10.1093/mnras/stae1398
2024-06-18
Abstract:We present several machine learning (ML) models developed to efficiently separate stars formed in-situ in Milky Way-type galaxies from those that were formed externally and later accreted. These models, which include examples from artificial neural networks, decision trees and dimensionality reduction techniques, are trained on a sample of disc-like, Milky Way-mass galaxies drawn from the ARTEMIS cosmological hydrodynamical zoom-in simulations. We find that the input parameters which provide an optimal performance for these models consist of a combination of stellar positions, kinematics, chemical abundances ([Fe/H] and [$\alpha$/Fe]) and photometric properties. Models from all categories perform similarly well, with area under the precision-recall curve (PR-AUC) scores of $\simeq 0.6$. Beyond a galactocentric radius of $5$~kpc, models retrieve $>90\%$ of accreted stars, with a sample purity close to $60\%$, however the purity can be increased by adjusting the classification threshold. For one model, we also include host galaxy-specific properties in the training, to account for the variability of accretion histories of the hosts, however this does not lead to an improvement in performance. The ML models can identify accreted stars even in regions heavily dominated by the in-situ component (e.g., in the disc), and perform well on an unseen suite of simulations (the Auriga simulations). The general applicability bodes well for application of such methods on observational data to identify accreted substructures in the Milky Way without the need to resort to selection cuts for minimising the contamination from in-situ stars.
Astrophysics of Galaxies
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? The main goal of this paper is to develop machine learning (ML) models to efficiently distinguish between stars formed in situ within the Milky Way and stars that were formed externally and later accreted. Specifically, the research team employed various machine learning methods (including artificial neural networks, decision trees, and dimensionality reduction techniques) and trained these models on a series of simulated data. The paper achieves this goal through the following points: 1. **Data Source**: Samples from the ARTEMIS cosmological hydrodynamical zoom-in simulations were used, which include disk galaxies of Milky Way-like mass. 2. **Feature Selection**: Input parameters included the stars' positions, kinematic properties, chemical abundances (such as [Fe/H] and [α/Fe]), luminosity properties, and ages. 3. **Model Evaluation**: Various metrics (such as the area under the precision-recall curve, PR-AUC) were used to evaluate the performance of different models, and it was found that all types of models performed similarly, with PR-AUC scores around 0.6. 4. **Practical Application**: The study found that in regions beyond 5 kpc from the Galactic center, the models could identify over 90% of accreted stars with a sample purity close to 60%, and purity could be improved by adjusting the classification threshold. 5. **Scalability Verification**: On unseen simulation datasets (such as the Auriga simulations), the models still performed well, indicating that this method could be applied to actual observational data to identify accreted substructures in the Milky Way without relying on selective trimming to reduce in situ star contamination. In summary, this paper aims to develop an automated method to identify accreted stars in the Milky Way and demonstrates the potential application of this method in actual observations.