Machine Learning Data Augmentation Strategy for Electron Energy Loss Spectroscopy: Generative Adversarial Networks

Daniel del-Pozo-Bueno,Demie Kepaptsoglou,Quentin M Ramasse,Francesca Peiró,Sònia Estradé
DOI: https://doi.org/10.1093/mam/ozae014
IF: 4.0991
2024-04-01
Microscopy and Microanalysis
Abstract:Abstract Recent advances in machine learning (ML) have highlighted a novel challenge concerning the quality and quantity of data required to effectively train algorithms in supervised ML procedures. This article introduces a data augmentation (DA) strategy for electron energy loss spectroscopy (EELS) data, employing generative adversarial networks (GANs). We present an innovative approach, called the data augmentation generative adversarial network (DAG), which facilitates data generation from a very limited number of spectra, around 100. Throughout this study, we explore the optimal configuration for GANs to produce realistic spectra. Notably, our DAG generates realistic spectra, and the spectra produced by the generator are successfully used in real-world applications to train classifiers based on artificial neural networks (ANNs) and support vector machines (SVMs) that have been successful in classifying experimental EEL spectra.
materials science, multidisciplinary,microscopy
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use generative adversarial networks (GANs) for data augmentation (DA) in electron energy - loss spectroscopy (EELS) to overcome the challenge of the need for high - quality and large amounts of data when training supervised machine learning (ML) algorithms. Specifically, the paper proposes a GAN - based data augmentation strategy, called Data - Augmented Generative Adversarial Network (DAG), for generating synthetic spectra from a limited number of experimental spectra (about 100). These synthetic spectra can be used to train classifiers, such as artificial neural networks (ANNs) and support vector machines (SVMs), to successfully classify experimental EEL spectra. ### Main Problems 1. **Data Quality and Quantity**: One of the main challenges in applying machine learning algorithms in EELS currently is the need for a large amount of high - quality data to train models. However, obtaining such data is both expensive and time - consuming, especially when dealing with samples that are sensitive to electron beams. 2. **Data Augmentation**: Increasing the diversity and quantity of the data set by generating synthetic spectra can significantly improve the performance of supervised learning models, especially when dealing with unbalanced data sets. ### Solutions The paper proposes a GAN - based data augmentation method. The specific steps are as follows: 1. **Constructing the Experimental Data Set**: Extract a data set of specific features (such as the L2,3 white lines of transition metals and the oxygen K - edge) from the existing EELS spectra. 2. **Designing the DAG Model**: Four different DAG architectures (Single - MonoTrans GAN, Multi - MonoTrans GAN, Single - BiTrans GAN, and Multi - BiTrans GAN) are developed. Each architecture aims to improve the quality of the generated spectra through different transformations and multiple discriminators. 3. **Training and Evaluation**: Use multiple evaluation metrics (such as Fréchet Inception Distance (FID), Pearson Correlation Coefficient (PCC), and Cosine Distance (CosD)) to monitor and evaluate the quality of the generated synthetic spectra, and adopt an early - stopping strategy to avoid over - training. 4. **Application and Verification**: The generated synthetic spectra are used to train classifiers to identify the oxidation states in iron (Fe) and manganese (Mn) oxides through their respective white - line features. ### Key Contributions - **Data Augmentation**: By generating synthetic spectra, the diversity and quantity of the data set are significantly increased, thereby improving the performance of supervised learning models. - **Model Architecture**: Multiple DAG architectures are proposed to improve the quality of the generated spectra through different transformations and multiple discriminators. - **Evaluation Method**: Multiple evaluation metrics and an early - stopping strategy are introduced to ensure that the generated synthetic spectra are of high quality and practical. Through these methods, the paper shows how to effectively use GANs for data augmentation, thereby improving the performance of machine learning models in EELS data analysis.