Attention Sequence to Sequence Model for Machine Remaining Useful Life Prediction

Mohamed Ragab,Zhenghua Chen,Min Wu,Chee-Keong Kwoh,Ruqiang Yan,Xiaoli Li
DOI: https://doi.org/10.48550/arXiv.2007.09868
2020-07-20
Abstract:Accurate estimation of remaining useful life (RUL) of industrial equipment can enable advanced maintenance schedules, increase equipment availability and reduce operational costs. However, existing deep learning methods for RUL prediction are not completely successful due to the following two reasons. First, relying on a single objective function to estimate the RUL will limit the learned representations and thus affect the prediction accuracy. Second, while longer sequences are more informative for modelling the sensor dynamics of equipment, existing methods are less effective to deal with very long sequences, as they mainly focus on the latest information. To address these two problems, we develop a novel attention-based sequence to sequence with auxiliary task (ATS2S) model. In particular, our model jointly optimizes both reconstruction loss to empower our model with predictive capabilities (by predicting next input sequence given current input sequence) and RUL prediction loss to minimize the difference between the predicted RUL and actual RUL. Furthermore, to better handle longer sequence, we employ the attention mechanism to focus on all the important input information during training process. Finally, we propose a new dual-latent feature representation to integrate the encoder features and decoder hidden states, to capture rich semantic information in data. We conduct extensive experiments on four real datasets to evaluate the efficacy of the proposed method. Experimental results show that our proposed method can achieve superior performance over 13 state-of-the-art methods consistently.
Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve two main problems in the remaining useful life (RUL) prediction of industrial equipment: 1. **Limitations of a single objective function**: Existing deep - learning methods usually rely on a single objective function to estimate RUL, which limits the quality of the learned representation and thus affects the prediction accuracy. Specifically, using only one regression target (such as minimizing the difference between the predicted RUL and the actual RUL) may not be sufficient to fully capture the complex patterns in the data. 2. **Ineffectiveness in handling long sequences**: Although longer sequences are more informative for modeling device - sensor dynamics, existing methods do not perform well when dealing with very long sequences. These methods tend to focus more on the most recent information and ignore the importance of historical information, leading to a decline in prediction performance. To solve the above problems, the author proposes an attention - based sequence - to - sequence model combined with an auxiliary task (ATS2S, Attention Sequence to Sequence with Auxiliary Task). The main improvements of this model include: - **Joint optimization of reconstruction loss and RUL prediction loss**: By simultaneously optimizing two objectives - reconstructing the next input sequence (to endow the model with prediction ability) and predicting RUL (to minimize the difference between the predicted RUL and the actual RUL), the expressiveness and prediction accuracy of the model are improved. \[ L(\theta)=\alpha L_{\text{rec}}(\theta)+L_{\text{rul}}(\theta) \] where: - \(L_{\text{rec}}(\theta)\) is the reconstruction loss, defined as the mean - squared error between the predicted output and the target output: \[ L_{\text{rec}}(\theta)=\frac{1}{N}\sum_{i = 1}^{N}\|\hat{Y}_{i}-Y_{i}\|_{2}^{2} \] - \(L_{\text{rul}}(\theta)\) is the RUL prediction loss, defined as the mean - squared error between the predicted RUL label and the true RUL label: \[ L_{\text{rul}}(\theta)=\frac{1}{N}\sum_{i = 1}^{N}(\hat{\text{RUL}}_{i}-\text{RUL}_{i})^{2} \] - **Introduction of an attention mechanism**: To better handle long sequences, the author introduces an attention mechanism, enabling the model to focus on all important input information rather than just the most recent information. This helps the model capture richer semantic information during the training process. - **Dual - latent - feature representation**: The author proposes a new dual - latent - feature - representation method, combining encoder features and decoder hidden states to capture rich semantic information in the data. Through these improvements, the experimental results of the ATS2S model on four real - world datasets show that it significantly outperforms 14 existing state - of - the - art methods in RUL prediction.