Abstract:When developing Prognostic and Health Management (PHM) applications for manufacturing systems, data acquired frequently comes with issues which hinder further data analysis. However, there is neither a clear definition of the data quality nor evaluation methods to quantify if acquired data is suitable for these prognostic modeling tasks such as failures detection, diagnosis and prediction. Especially, during health diagnosis modeling of engineering systems, based on data-driven method, acquired data is expected to contain clusters that can be used to differentiate multiple system health conditions. So in most cases, once data is acquired, people would like to intuitively believe that data is able to cluster into subgroups. However, this bias could lead to acceptance of false information in data. Furthermore, most of the existing metrics, such as clustering tendency in statistics and cluster-ability in data mining, only individually evaluate data characteristics without considering prognostic modeling. This paper proposes a new method to evaluate and improve data quality for system health diagnosis modeling. The clusters, as critical data characteristics for modeling multiple system conditions, are first estimated by “visualization” on the dissimilarity spectrum from spectral analysis and then evaluated in terms of their fitness and separation with each others. A visual assessment based outlier detection method is also proposed to recognize outliers from the data, which utilizes the graphic intermediate results from previous evaluation. Finally one group of bearing testing dataset acquired from real industrial applications is used to demonstrate how proposed methods are used to evaluate and improve the data quality.

Software Metrics Data Clustering for Quality Prediction

Software Quality Prediction Using Affinity Propagation Algorithm

Software Metrics Analysis with Genetic Algorithm and Affinity Propagation Clustering.

QoS Prediction of Web Services Based on Two-Phase K-Means Clustering

Assessing Software Quality by Program Clustering and Defect Prediction

Big Data Quality Prediction in the Process Industry: A Distributed Parallel Modeling Framework

Software-defect prediction within and across projects based on improved self-organizing data mining

An Empirical Study of Dynamic Incomplete-Case Nearest Neighbor Imputation in Software Quality Data.

A Study of Applying Unsupervised Learning Methods for Document Clustering and Automatic Categorization of Software.

Software Code Quality Measurement: Implications from Metric Distributions

Analysis and design of financial data mining system based on fuzzy clustering

Applying Machine Learning Analysis for Software Quality Test

An Empirical Study on the Procedure to Derive Software Quality Estimation Models

Prediction of Human Performance Capability during Software Development using Classification

Construction of Student Information Management System Based on Data Mining and Clustering Algorithm

Data quality evaluation and improvement for prognostic modeling using visual assessment based data partitioning method

A novel Move-Split-Merge based Fuzzy C-Means algorithm for clustering time series

Data Mining Algorithm for Cloud Network Information Based on Artificial Intelligence Decision Mechanism

Data Mining for Quality Prediction in Software-as-A-Service Concept: A Case Study in Offset Printing Company

Metric-based software reliability prediction approach and its application

Using Multi‐pattern Clustering Methods to Improve Software Maintenance Quality