Abstract:Machine learning models in astrophysics are often limited in scope and cannot adapt to data from new instruments or tasks. We introduce SpectraFM, a Transformer-based foundation model architecture that can be pre-trained on stellar spectra from any wavelength range and instrument. SpectraFM excels in generalization by combining flexibility with knowledge transfer from pre-training, allowing it to outperform traditional machine learning methods, especially in scenarios with limited training data. Our model is pre-trained on approximately 90k examples of synthetic spectra to predict the chemical abundances (Fe, Mg, O), temperature, and specific gravity of stars. We then fine-tune the model on real spectra to adapt it to observational data before fine-tuning it further on a restricted 100-star training set in a different wavelength range to predict iron abundance. Despite a small iron-rich training set of real spectra, transfer learning from the synthetic spectra pre-training enables the model to perform well on iron-poor stars. In contrast, a neural network trained from scratch fails at this task. We investigate the Transformer attention mechanism and find that the wavelengths receiving attention carry physical information about chemical composition. By leveraging the knowledge from pre-training and its ability to handle non-spectra inputs, SpectraFM reduces the need for large training datasets and enables cross-instrument and cross-domain research. Its adaptability makes it well-suited for tackling emerging challenges in astrophysics, like extracting insights from multi-modal datasets.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in current astrophysics, machine - learning models are often limited and cannot adapt well when dealing with data from new instruments or new tasks. Specifically: 1. **Data Adaptability Problem**: Existing machine - learning models can usually only handle data from specific instruments and perform poorly on instruments or wavelength ranges outside the training set. 2. **Small - Sample Problem**: When the amount of available labeled data is very limited (for example, there are only iron content measurements for dozens of stars), it is difficult for traditional machine - learning models to be effectively trained. 3. **Challenges in Cross - Instrument and Cross - Domain Research**: Data from different observational instruments and different modalities need to be analyzed collaboratively, but existing models have difficulty dealing with the fusion of such multi - source data. To solve these problems, the paper introduces **SpectraFM**, a base model based on the Transformer architecture, aiming to improve the generalization ability and adaptability of the model through pre - training and fine - tuning. The main features of SpectraFM include: - **Pre - training and Transfer Learning**: SpectraFM is pre - trained on a large amount of synthetic spectral data, thereby obtaining an understanding of stellar spectral features. Then it is fine - tuned with a small amount of real data, enabling the model to better adapt to actual observational data. - **Generalization Ability across Wavelengths and Instruments**: By introducing a wavelength encoding mechanism, SpectraFM can process spectral data from any wavelength range and any instrument, reducing the dependence on large - scale training data sets. - **Attention Mechanism for Physical Information**: By analyzing the attention mechanism of the Transformer, it is found that the model can focus on spectral features with physical significance, ensuring that the prediction results have a physical basis. These improvements make SpectraFM perform excellently when dealing with small - sample data and cross - instrument and cross - domain data, especially in tasks such as iron content prediction, significantly outperforming neural networks trained from scratch. ### Formula Summary - **Loss Function**: \[ L(y,\hat{y})=\frac{(\hat{y} - y)^{2}}{2e^{s}}+\frac{s}{2} \] where \(s = \ln(\sigma_{\text{data}}^{2}+\sigma_{\text{pred}}^{2})\), \(\sigma_{\text{data}}\) is the known uncertainty of the data, and \(\sigma_{\text{pred}}\) is the uncertainty of the model prediction. - **Wavelength Position Encoding**: \[ PE(\hat{\lambda},k)= \begin{cases} \sin\left(\frac{1000\cdot\hat{\lambda}}{10000^{k/d_{\text{model}}}}\right), & \text{if } k \text{ is even}\\ \cos\left(\frac{1000\cdot\hat{\lambda}}{10000^{k/d_{\text{model}}}}\right), & \text{if } k \text{ is odd} \end{cases} \] where \(\hat{\lambda}=\frac{\lambda-\lambda_{\text{min}}}{\lambda_{\text{max}}-\lambda_{\text{min}}}\) and \(d_{\text{model}} = 256\) is the embedding dimension. Through these methods, SpectraFM can more effectively deal with various challenges in astrophysics, providing new tools and ideas for future astronomical research.

SpectraFM: Tuning into Stellar Foundation Models

Towards an astronomical foundation model for stars with a Transformer-based model

Toward a Spectral Foundation Model: An Attention-Based Approach with Domain-Inspired Fine-Tuning and Wavelength Parameterization

STARS: Sensor-agnostic Transformer Architecture for Remote Sensing

Boost Spectrum Prediction with Temporal-Frequency Fusion Network Via Transfer Learning

Specialized Foundation Models Struggle to Beat Supervised Baselines

SpectroTranslator: Deep-neural network algorithm for homogenising spectroscopic parameters

Radio Galaxy Zoo: Towards building the first multi-purpose foundation model for radio astronomy with self-supervised learning

OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery

SpectroTranslator: a deep-neural network algorithm to homogenize spectroscopic parameters

Enhancing radioisotope identification in gamma spectra with transfer learning

SpectraTr: A novel deep learning model for qualitative analysis of drug spectroscopy based on transformer structure

Estimation of Physical Stellar Parameters from Spectral Models using Deep Learning Techniques

Supervised Machine Learning for Analysing Spectra of Exoplanetary Atmospheres

A Machine-Learned "Chemical Intuition" to Overcome Spectroscopic Data Scarcity

SpectralGPT: Spectral Remote Sensing Foundation Model

SPT: Spectral transformer for age and mass estimations of red giant stars

Building 6G Radio Foundation Models with Transformer Architectures

AstroLLaMA: Towards Specialized Foundation Models in Astronomy

The Stellar Spectra Factory (SSF) Based On SLAM

A Self-consistent Data-driven Model for Determining Stellar Parameters from Optical and Near-infrared Spectra