Synthetic Data Generation of Complex Machines with Time Series Generative Adversarial Network
Zhongyi Zhang,Chun Song,Zhi Zhai,Meng Ma,Anqi He
DOI: https://doi.org/10.1109/phm-hangzhou58797.2023.10482479
2023-01-01
Abstract:With the development of commercial spaceflight, the hot-fire testing of reusable liquid rocket engines (LREs) has also been growing, leading to a significant increase in the amount of data. However, among this large amount of data, the proportion of fault data is very small, and the coverage of fault modes is limited, which poses a challenge to the development of fault diagnosis systems. In addition, due to the high level of confidentiality of the test data, the data is usually not shared with system developers, which becomes a longstanding obstacle to data-driven research in the space industry. In order to address the shortage of partial data, there is a great deal of research on simulation-based method. But the simulation data generating from models are different from that of real environment, which makes little effort to data-driven anomaly detection research. Thus, we explored how Generative Adversarial Networks (GAN) can be utilized for complementing real datasets with time-dynamic property. At the same time, since the synthetic data has the same characteristics as the real data but does not contain complete information, it opens up the possibility of providing confidential data to system developers. As a specific target, the focus of this paper is on synthesizing time series data. The largest challenge is to ensure that the synthetic data has engineering values, such as the relationships on synthetic sequences should be consistent with the original variables. Especially for LREs, there is a strong correlation between the data monitored by some sensors of some components, which cannot be ignored when performing data analysis. Considering this unique temporal correlation, Time Series Generative Adversarial Network (TimeGAN) with three loss functions for objective optimization is developed for this effort. In addition to generate and discriminate modules in traditional GAN, it contains embedding and recovery modules, and introduces hidden latent. We use TimeGAN to learn a test data containing eight sequences and generate realistic “fake data”. We used autoregressive moving average models to test the residuals of the individual synthetic sequences and the results demonstrate that this method is well suited for data synthesis of sensory data from LREs.