Abstract:This study introduces a predictive maintenance strategy for high pressure industrial compressors using sensor data and features derived from unsupervised clustering integrated into classification models. The goal is to enhance model accuracy and efficiency in detecting compressor failures. After data pre processing, sensitive clustering parameters were tuned to identify algorithms that best capture the dataset's temporal and operational characteristics. Clustering algorithms were evaluated using quality metrics like Normalized Mutual Information (NMI) and Adjusted Rand Index (ARI), selecting those most effective at distinguishing between normal and non normal conditions. These features enriched regression models, improving failure detection accuracy by 4.87 percent on average. Although training time was reduced by 22.96 percent, the decrease was not statistically significant, varying across algorithms. Cross validation and key performance metrics confirmed the benefits of clustering based features in predictive maintenance models.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to use sensor data and clustering - based features to improve the accuracy and efficiency of predictive maintenance models for high - pressure industrial compressors, so as to better detect compressor failures**. Specifically, the authors propose a hybrid clustering model approach to solve the problem through the following steps: 1. **Data pre - processing**: - Clean the data and remove rows with missing or invalid data. - Calculate the autocorrelation matrix to identify highly correlated features and remove redundant features. - Use analysis of variance (ANOVA) to further evaluate the discriminatory power of the remaining features and retain statistically significant features (p - value < 0.05). - Standardize the data and create the target variable "NORMAL", defining the time period of event occurrence based on specific timestamps. 2. **Determine the optimal clustering parameters**: - For density - based clustering algorithms (such as HDBSCAN), determine the best epsilon value. Calculate the distances between neighboring points and draw a curvature graph to find the point of maximum curvature, thereby determining the epsilon value. - For algorithms such as K - Means, determine the optimal number of clusters (k). Use metrics such as the silhouette coefficient (Silhouette score) to select the best k value. 3. **Apply and evaluate clustering algorithms**: - Cluster multiple clustering algorithms (such as K - Means, HDBSCAN, OPTICS, BIRCH, GMM, and MS - AMS) using the optimized parameters. - Use quality measures such as the adjusted Rand index (ARI) and normalized mutual information (NMI) to evaluate the clustering effect and select the algorithm that best distinguishes between normal and abnormal states. 4. **Combine with classification models**: - Add the clustering results as additional features to the classification model to improve the accuracy of fault detection. - Use cross - validation techniques to evaluate model performance and compare the training time and accuracy with and without clustering features. Through the above methods, the authors successfully improved the accuracy of fault detection, with an average increase of 4.87%, and in some cases significantly reduced the training time. This indicates that using clustering features can more effectively manage and predict the operating status of high - pressure industrial compressors and reduce the risk of failure. ### Key formulas - **Autocorrelation matrix**: \[ \text{Correlation Matrix}=\text{corr}(X) \] where \(X\) is the feature matrix. - **F - value and p - value**: \[ F = \frac{\text{Between - group variance}}{\text{Within - group variance}} \] \[ p\text{-value}=P(F > F_{\text{observed}}) \] - **Silhouette coefficient (Silhouette Score)**: \[ s(i)=\frac{b(i)-a(i)}{\max(a(i),b(i))} \] where \(a(i)\) is the average distance from sample \(i\) to other samples in the same cluster, and \(b(i)\) is the average distance from sample \(i\) to samples in the nearest different cluster. - **Adjusted Rand index (ARI)**: \[ ARI=\frac{\sum_{ij}\binom{n_{ij}}{2}-[\sum_i\binom{a_i}{2}\sum_j\binom{b_j}{2}]/\binom{n}{2}}{\frac{1}{2}[\sum_i\binom{a_i}{2}+\sum_j\binom{b_j}{2}]-[\sum_i\binom{a_i}{2}\sum_j\binom{b_j}{2}]/\binom{n}{2}} \] where \(n_{ij}\) is simultaneously belonging to the \(i\) - th class

Predictive Maintenance Study for High-Pressure Industrial Compressors: Hybrid Clustering Models

Predictive Maintenance Model Based on Anomaly Detection in Induction Motors: A Machine Learning Approach Using Real-Time IoT Data

A Scalable Predictive Maintenance Model for Detecting Wind Turbine Component Failures Based on SCADA Data

Intelligent fault classification of air compressors using Harris hawks optimization and machine learning algorithms

Towards Predictive Maintenance of a Heavy-Duty Gas Turbine A New Hybrid Intelligent Methodology for Performance Simulation

A Novel Energy Performance-Based Diagnostic Model for Centrifugal Compressor using Hybrid ML Model

On failure prediction and failure identification modeling in a gas turbine system: a survey of classification approaches in a three-class problem

Artificial Intelligence Based Gas Turbine Compressor Wash: A Predictive Approach

Hidden-Markov Model Based Sequential Clustering for Autonomous Diagnostics.

Comprehensive Study Of Predictive Maintenance In Industries Using Classification Models And LSTM Model

A hybrid clustering approach integrating first-principles knowledge with data for fault detection in HVAC systems

Machine learning algorithm functional on environmental sustainability assessment in turbomachinery sector: Application on centrifugal compressors

Machine Fault Detection Using a Hybrid CNN-LSTM Attention-Based Model

Wind Turbine Fault Diagnosis and Predictive Maintenance Through Statistical Process Control and Machine Learning

Predicting Machine Failures from Multivariate Time Series: An Industrial Case Study

Wind turbine gearbox fault prognosis using high-frequency SCADA data

A dissimilarity-based approach to predictive maintenance with application to HVAC systems

Predictive Maintenance Planning for Industry 4.0 Using Machine Learning for Sustainable Manufacturing

A data-driven pipeline pressure procedure for remote monitoring of centrifugal pumps

State monitoring and fault prediction of centrifugal compressors based on long short–term memory and principal component analysis (LSTM-PCA)