Abstract:This paper proposes a novel approach for modeling the problem of fault diagnosis using the Case Western Reserve University (CWRU) bearing fault dataset. Although the dataset is considered a standard reference for testing new algorithms, the typical dataset division suffers from data leakage, as shown by Hendriks et al. (2022) and Abburi et al. (2023), leading to papers reporting over-optimistic results. While their proposed division significantly mitigates this issue, it does not eliminate it entirely. Moreover, their proposed multi-class classification task can still lead to an unrealistic scenario by excluding the possibility of more than one fault type occurring at the same or different locations. As advocated in this paper, a multi-label formulation (detecting the presence of each type of fault for each location) can solve both issues, leading to a scenario closer to reality. Additionally, this approach mitigates the heavy class imbalance of the CWRU dataset, where faulty cases appear much more frequently than healthy cases, even though the opposite is more likely to occur in practice. A multi-label formulation also enables a more precise evaluation using prevalence-independent evaluation metrics for binary classification, such as the ROC curve. Finally, this paper proposes a more realistic dataset division that allows for more diversity in the training dataset while keeping the division free from data leakage. The results show that this new division can significantly improve performance while enabling a fine-grained error analysis. As an application of our approach, a comparative benchmark is performed using several state-of-the-art deep learning models applied to 1D and 2D signal representations in time and/or frequency domains.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve some key problems in bearing fault diagnosis, especially the common problems when using the Case Western Reserve University (CWRU) bearing fault dataset. Specifically, the paper attempts to solve the following problems: 1. **Data Leakage Problem**: - The traditional CWRU dataset partitioning method will lead to data leakage, that is, the training set and the test set contain signals from the same bearing, which makes the model unable to generalize well in practical applications. For example, Hendriks et al. (2022) and Abburi et al. (2023) pointed out that even though their improved methods significantly reduced data leakage, they still did not completely eliminate this problem. 2. **Limitations of Multi - class Classification Tasks**: - Most of the existing research uses multi - class classification tasks to diagnose bearing faults. This method assumes that only one fault type occurs at a time, which is inconsistent with the actual situation. In fact, multiple faults may occur simultaneously at different positions or on different components at the same position. 3. **Class Imbalance Problem**: - In the CWRU dataset, the number of fault samples is much larger than that of normal samples, resulting in class imbalance. This imbalance will affect the performance evaluation of the model, especially when using accuracy as an evaluation metric. 4. **Applicability in Real - world Scenarios**: - Traditional methods fail to fully consider the complexity and diversity in practical application scenarios. For example, the assumption of synchronous signals is difficult to achieve in the actual industrial environment. To solve the above problems, the paper proposes a new multi - label classification - based method and makes a more reasonable partitioning of the dataset. Specifically: - **Multi - label Classification**: For each position (drive - end and fan - end), the model detects whether each type of fault (inner ring, outer ring, and ball) exists. This allows multiple fault types to exist simultaneously, which is more in line with the actual situation. - **Dataset Partitioning**: Ensure that all signals of healthy bearings only appear in the test set, thereby avoiding data leakage. In addition, by randomly selecting signals with different loads, fault sizes, types, and positions for training and testing, the diversity of the dataset is increased. - **Evaluation Metrics**: Use evaluation metrics that are not affected by prior probabilities, such as the ROC curve and AUROC, to obtain more accurate model performance evaluation. Through these improvements, the paper provides a bearing fault diagnosis method that is closer to the real - world and can better evaluate and apply deep - learning models.

Benchmarking deep learning models for bearing fault diagnosis using the CWRU dataset: A multi-label approach

A Novel Transfer Learning Method for Robot Bearing Fault Diagnosis Based on Deep Convolutional Residual Wasserstein Adversarial Network.

A Fault Diagnosis Method for Rolling Bearing Based on Deep Adversarial Transfer Learning with Transferability Measurement

Bearing-Fault Diagnosis with Signal-to-RGB Image Mapping and Multichannel Multiscale Convolutional Neural Network

Lite and Efficient Deep Learning Model for Bearing Fault Diagnosis Using the CWRU Dataset

A multi-scale collaborative fusion residual neural network-based approach for bearing fault diagnosis

Rolling Bearing Fault Diagnosis Using Multi-Sensor Data Fusion Based on 1D-CNN Model

A New Bearing Fault Diagnosis Method Based on Deep Transfer Network and Supervised Joint Matching

Multi-scale Quaternion CNN and BiGRU with Cross Self-attention Feature Fusion for Fault Diagnosis of Bearing

A novel approach for bearings multiclass fault diagnosis fusing multiscale deep convolution and hybrid attention networks

Machine Learning Based Bearing Fault Diagnosis Using the Case Western Reserve University Data: A Review

An unsupervised bearing fault diagnosis based on deep subdomain adaptation under noise and variable load condition

One-Dimensional Multi-Scale Domain Adaptive Network for Bearing-Fault Diagnosis under Varying Working Conditions

A New Deep Convolutional Domain Adaptation Network for Bearing Fault Diagnosis under Different Working Conditions

Incorporating Heterogeneous Features into the Random Subspace Method for Bearing Fault Diagnosis

Deep learning neural networks with input processing for vibration-based bearing fault diagnosis under imbalanced data conditions

Wasserstein Generative Adversarial Network and Convolutional Neural Network (WG-CNN) for Bearing Fault Diagnosis

Bearing Fault Diagnosis Method Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning

Multi-scale deep intra-class transfer learning for bearing fault diagnosis

Intelligent fault diagnosis of rolling bearings under imbalanced data conditions using attention-based deep learning method

Fault Detection of Bearing by Resnet Classifier with Model-Based Data Augmentation