Reinforcement Learning-based Anomaly Detection for PHM applications

Samir Khan,S. Tsutsumi,T. Yairi,Shinichi Nakasuka
DOI: https://doi.org/10.1109/AERO53065.2022.9843543
2022-03-05
Abstract:Prognostics and Health Management (PHM) is an essential requirement for engineering assets. Its processing strategies include modules for the detection, diagnostics and prognostics of known fault conditions. However, during operation, there are always fault conditions that were not anticipated. These events manifest as anomalies and could potentially be catastrophic with the loss of the asset. Anomalies can indicate an impending fault condition, therefore, the automatic identification of anomalies can lead to solving reliability problems that might manifest because of complexities arising from the operating environment and component degradation. Data-driven approaches have gained increasing popularity as a comprehensive anomaly detection method whenever data on nominal and fault conditions is available. However, many supervised learning techniques often face problems whenever models are trained from the limited set of partially labelled anomalies, whilst the rest of the dataset is left unlabelled. An alternative is to use unsupervised learning techniques, that are supposed to obviate stipulating the performance of the anomaly detector. But these still often produce many false positives because of the lack of prior knowledge of true anomalies. Considering this, this article investigates the use of a Reinforcement Learning (RL)-based approach to address the problem of unknown classes of anomalies that might lie beyond the scope of the initially trained model. A Q-learning method is used to exploit the existing data model whilst exploring new classes to improve classification accuracy and optimise decision making. This makes it of significant practical benefit, as anomalies can be unpredictable in form and usually evolve over time. In particular, a deep network-based anomaly detector agent is used to initially learn the action-value function (i.e., the Q-value function) from the limited labelled data. An environment is created for the agent to actively interact not only with the labelled anomalies but also to explore rare and novel unlabelled anomalies that might lie beyond the scope of the initially trained model. A reward function is defined based on the sparse normative content, which stipulates when the agent detects the anomaly state. However, the robustness of this method is still an open question as it simply shifts the anomaly detection responsibility onto the reward function being used. This shows the strong dependence on how the problem state-action space is defined for these methods to perform well.
Engineering,Computer Science
What problem does this paper attempt to address?