Abstract:We present several machine learning (ML) models developed to efficiently separate stars formed in-situ in Milky Way-type galaxies from those that were formed externally and later accreted. These models, which include examples from artificial neural networks, decision trees and dimensionality reduction techniques, are trained on a sample of disc-like, Milky Way-mass galaxies drawn from the ARTEMIS cosmological hydrodynamical zoom-in simulations. We find that the input parameters which provide an optimal performance for these models consist of a combination of stellar positions, kinematics, chemical abundances ([Fe/H] and [$\alpha$/Fe]) and photometric properties. Models from all categories perform similarly well, with area under the precision-recall curve (PR-AUC) scores of $\simeq 0.6$. Beyond a galactocentric radius of $5$~kpc, models retrieve $>90\%$ of accreted stars, with a sample purity close to $60\%$, however the purity can be increased by adjusting the classification threshold. For one model, we also include host galaxy-specific properties in the training, to account for the variability of accretion histories of the hosts, however this does not lead to an improvement in performance. The ML models can identify accreted stars even in regions heavily dominated by the in-situ component (e.g., in the disc), and perform well on an unseen suite of simulations (the Auriga simulations). The general applicability bodes well for application of such methods on observational data to identify accreted substructures in the Milky Way without the need to resort to selection cuts for minimising the contamination from in-situ stars.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? The main goal of this paper is to develop machine learning (ML) models to efficiently distinguish between stars formed in situ within the Milky Way and stars that were formed externally and later accreted. Specifically, the research team employed various machine learning methods (including artificial neural networks, decision trees, and dimensionality reduction techniques) and trained these models on a series of simulated data. The paper achieves this goal through the following points: 1. **Data Source**: Samples from the ARTEMIS cosmological hydrodynamical zoom-in simulations were used, which include disk galaxies of Milky Way-like mass. 2. **Feature Selection**: Input parameters included the stars' positions, kinematic properties, chemical abundances (such as [Fe/H] and [α/Fe]), luminosity properties, and ages. 3. **Model Evaluation**: Various metrics (such as the area under the precision-recall curve, PR-AUC) were used to evaluate the performance of different models, and it was found that all types of models performed similarly, with PR-AUC scores around 0.6. 4. **Practical Application**: The study found that in regions beyond 5 kpc from the Galactic center, the models could identify over 90% of accreted stars with a sample purity close to 60%, and purity could be improved by adjusting the classification threshold. 5. **Scalability Verification**: On unseen simulation datasets (such as the Auriga simulations), the models still performed well, indicating that this method could be applied to actual observational data to identify accreted substructures in the Milky Way without relying on selective trimming to reduce in situ star contamination. In summary, this paper aims to develop an automated method to identify accreted stars in the Milky Way and demonstrates the potential application of this method in actual observations.

Applying machine learning to Galactic Archaeology: how well can we recover the origin of stars in Milky Way-like galaxies?

Applying machine learning to Galactic Archaeology: how well can we recover the origin of stars in Milky Way-like galaxies?

ERGO-ML: Towards a robust machine learning model for inferring the fraction of accreted stars in galaxies from integral-field spectroscopic maps

In-situ or accreted? Using deep learning to infer the origin of extragalactic globular clusters from observables

Finding accreted stars in the Milky Way: clues from NIHAO simulations

Constraints on the in-situ and ex-situ stellar masses in nearby galaxies with Artificial Intelligence

New Observational Constraints to Milky Way Chemodynamical models

Galaxies in the zone of avoidance: Misclassifications using machine learning tools

Introducing galactic structure finder: the multiple stellar kinematic structures of a simulated Milky Way mass galaxy

StarGO: A New Method to Identify the Galactic Origins of Halo Stars

A Machine-Learning Photometric Classifier for Massive Stars in Nearby Galaxies I. the Method

Towards Galactic Archaeology with Inferred Ages of Giant Stars From Gaia Spectra

Populating Galaxies Into Halos Via Machine Learning on the Simba Simulation

Not Hydro: Using Neural Networks to estimate galaxy properties on a Dark-Matter-Only simulation

Identifying Kinematic Structures in Simulated Galaxies Using Unsupervised Machine Learning

Machine learning applications in studies of the physical properties of active galactic nuclei based on photometric observations

Galactic Archaeology: Tracing the Milky Way's Formation and Evolution through Stellar Populations

Chemo-kinematic analysis of metal-poor stars with unsupervised machine learning

Using machine learning to investigate the populations of dusty evolved stars in various metallicities