SpectraFM: Tuning into Stellar Foundation Models

Nolan Koblischke,Jo Bovy
2024-11-07
Abstract:Machine learning models in astrophysics are often limited in scope and cannot adapt to data from new instruments or tasks. We introduce SpectraFM, a Transformer-based foundation model architecture that can be pre-trained on stellar spectra from any wavelength range and instrument. SpectraFM excels in generalization by combining flexibility with knowledge transfer from pre-training, allowing it to outperform traditional machine learning methods, especially in scenarios with limited training data. Our model is pre-trained on approximately 90k examples of synthetic spectra to predict the chemical abundances (Fe, Mg, O), temperature, and specific gravity of stars. We then fine-tune the model on real spectra to adapt it to observational data before fine-tuning it further on a restricted 100-star training set in a different wavelength range to predict iron abundance. Despite a small iron-rich training set of real spectra, transfer learning from the synthetic spectra pre-training enables the model to perform well on iron-poor stars. In contrast, a neural network trained from scratch fails at this task. We investigate the Transformer attention mechanism and find that the wavelengths receiving attention carry physical information about chemical composition. By leveraging the knowledge from pre-training and its ability to handle non-spectra inputs, SpectraFM reduces the need for large training datasets and enables cross-instrument and cross-domain research. Its adaptability makes it well-suited for tackling emerging challenges in astrophysics, like extracting insights from multi-modal datasets.
Instrumentation and Methods for Astrophysics,Astrophysics of Galaxies,Solar and Stellar Astrophysics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in current astrophysics, machine - learning models are often limited and cannot adapt well when dealing with data from new instruments or new tasks. Specifically: 1. **Data Adaptability Problem**: Existing machine - learning models can usually only handle data from specific instruments and perform poorly on instruments or wavelength ranges outside the training set. 2. **Small - Sample Problem**: When the amount of available labeled data is very limited (for example, there are only iron content measurements for dozens of stars), it is difficult for traditional machine - learning models to be effectively trained. 3. **Challenges in Cross - Instrument and Cross - Domain Research**: Data from different observational instruments and different modalities need to be analyzed collaboratively, but existing models have difficulty dealing with the fusion of such multi - source data. To solve these problems, the paper introduces **SpectraFM**, a base model based on the Transformer architecture, aiming to improve the generalization ability and adaptability of the model through pre - training and fine - tuning. The main features of SpectraFM include: - **Pre - training and Transfer Learning**: SpectraFM is pre - trained on a large amount of synthetic spectral data, thereby obtaining an understanding of stellar spectral features. Then it is fine - tuned with a small amount of real data, enabling the model to better adapt to actual observational data. - **Generalization Ability across Wavelengths and Instruments**: By introducing a wavelength encoding mechanism, SpectraFM can process spectral data from any wavelength range and any instrument, reducing the dependence on large - scale training data sets. - **Attention Mechanism for Physical Information**: By analyzing the attention mechanism of the Transformer, it is found that the model can focus on spectral features with physical significance, ensuring that the prediction results have a physical basis. These improvements make SpectraFM perform excellently when dealing with small - sample data and cross - instrument and cross - domain data, especially in tasks such as iron content prediction, significantly outperforming neural networks trained from scratch. ### Formula Summary - **Loss Function**: \[ L(y,\hat{y})=\frac{(\hat{y} - y)^{2}}{2e^{s}}+\frac{s}{2} \] where \(s = \ln(\sigma_{\text{data}}^{2}+\sigma_{\text{pred}}^{2})\), \(\sigma_{\text{data}}\) is the known uncertainty of the data, and \(\sigma_{\text{pred}}\) is the uncertainty of the model prediction. - **Wavelength Position Encoding**: \[ PE(\hat{\lambda},k)= \begin{cases} \sin\left(\frac{1000\cdot\hat{\lambda}}{10000^{k/d_{\text{model}}}}\right), & \text{if } k \text{ is even}\\ \cos\left(\frac{1000\cdot\hat{\lambda}}{10000^{k/d_{\text{model}}}}\right), & \text{if } k \text{ is odd} \end{cases} \] where \(\hat{\lambda}=\frac{\lambda-\lambda_{\text{min}}}{\lambda_{\text{max}}-\lambda_{\text{min}}}\) and \(d_{\text{model}} = 256\) is the embedding dimension. Through these methods, SpectraFM can more effectively deal with various challenges in astrophysics, providing new tools and ideas for future astronomical research.