A Novel Imbalanced Data Classification Method Based on Weakly Supervised Learning for Fault Diagnosis
Hui Liu,Zhenyu Liu,Weiqiang Jia,Donghao Zhang,Jianrong Tan
DOI: https://doi.org/10.1109/tii.2021.3084132
IF: 12.3
2021-01-01
IEEE Transactions on Industrial Informatics
Abstract:The class imbalance problem has a huge impact on the performance of diagnostic models. When it occurs, the minority samples are easily ignored by classification models. Besides, the distribution of class imbalanced data differs from the actual data distribution, which makes it difficult for classifiers to learn an accurate decision boundary. To tackle the above issues, this article proposes a novel imbalanced data classification method based on weakly supervised learning. First, Bagging algorithm is employed to sample majority data randomly to generate several relatively balanced subsets, which are then used to train several support vector machine (SVM) classifiers. Next, these trained SVM classifiers are adopted to predict the labels of those unlabeled data, and samples that are predicted as minority class are added to the original dataset to reduce the imbalance ratio. The critical idea of this article is to introduce real-world samples into the imbalanced dataset by virtue of weakly supervised learning. In addition, bidirectional gated recurrent units are used to construct a diagnostic model for fault diagnosis, and a new weighted cross-entropy function is proposed as the loss function to reduce the impact of noise. Besides, it also increases the model's attention to the original minority samples. Furthermore, experimental evaluations of the proposed method are conducted on two datasets, i.e., Prognostics and Health Management challenge 2008 and 2010 datasets, and the experimental results demonstrate the effectiveness and superiority of the proposed method.