Smart predictive maintenance for high-performance computing systems: a literature review

André Luis da Cunha Dantas Lima,Vitor Moraes Aranha,Caio Jordão de Lima Carvalho,Erick Giovani Sperandio Nascimento
DOI: https://doi.org/10.1007/s11227-021-03811-7
IF: 3.3
2021-04-27
The Journal of Supercomputing
Abstract:Predictive maintenance is an invaluable tool to preserve the health of mission critical assets while minimizing the operational costs of scheduled intervention. Artificial intelligence techniques have been shown to be effective at treating large volumes of data, such as the ones collected by the sensors typically present in equipment. In this work, we aim to identify and summarize existing publications in the field of predictive maintenance that explore machine learning and deep learning algorithms to improve the performance of failure classification and detection. We show a significant upward trend in the use of deep learning methods of sensor data collected by mission critical assets for early failure detection to assist predictive maintenance schedules. We also identify aspects that require further investigation in future works, regarding exploration of life support systems for supercomputing assets and standardization of performance metrics.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?